Windows Azure as a Platform as a Service (PaaS)

Post on 05-Jan-2017

233 Views

Category:

Documents

4 Downloads

Preview:

Click to see full reader

Transcript

DEUTSCH-FRANZOumlSISCHE SOMMERUNIVERSITAumlT

FUumlR NACHWUCHSWISSENSCHAFTLER 2011

UNIVERSITEacute DrsquoEacuteTEacute FRANCO-ALLEMANDE

POUR JEUNES CHERCHEURS 2011

CLOUD COMPUTING

DEacuteFIS ET OPPORTUNITEacuteS CLOUD COMPUTING

HERAUSFORDERUNGEN UND MOumlGLICHKEITEN

177 ndash 227 2011

Windows Azure as a Platform as a Service (PaaS)

Jared Jackson Microsoft Research

Before we begin ndash Some Results

Vanilla 23

Chocolate 13

Strawberry 10

Coffee 3

Banana 3

Pistachio 7

Mango 3

Amarena 3

Malaga 3

Cherry 3

Tiramisu 3

Stratiatella 10

Cheescake 3

Cookies and Cream

3

Walnut 3

Cinamon 3

Favorite Ice Cream

Vanilla 33

Chocolate 11

Strawberry 5

Coffee 2

Cherry 2

Cookies and Cream 4

Butter Pecan 7

Neapolitan 4

Chocolate Chip 4

Other 29

Ice Cream Consumption

Source International Ice Cream Association (makeicecreamcom)

Windows Azure Overview

4

Web Application Model Comparison

Machines Running IIS ASPNET

Machines Running Windows Services

Machines Running SQL Server

Ad Hoc Application Model

5

Web Application Model Comparison

Machines Running IIS ASPNET

Machines Running Windows Services

Machines Running SQL Server

Ad Hoc Application Model

Web Role Instances Worker Role

Instances

Azure Storage Blob Queue Table

SQL Azure

Windows Azure Application Model

Key Components Fabric Controller

bull Manages hardware and virtual machines for service

Compute bull Web Roles

bull Web application front end

bull Worker Roles bull Utility compute

bull VM Roles bull Custom compute role bull You own and customize the VM

Storage bull Blobs

bull Binary objects

bull Tables bull Entity storage

bull Queues bull Role coordination

bull SQL Azure bull SQL in the cloud

Key Components Fabric Controller

bull Think of it as an automated IT department

bull ldquoCloud Layerrdquo on top of

bull Windows Server 2008

bull A custom version of Hyper-V called the Windows Azure Hypervisor

bull Allows for automated management of virtual machines

Key Components Fabric Controller

bull Think of it as an automated IT department bull ldquoCloud Layerrdquo on top of

bull Windows Server 2008

bull A custom version of Hyper-V called the Windows Azure Hypervisor bull Allows for automated management of virtual machines

bull Itrsquos job is to provision deploy monitor and maintain applications in data centers

bull Applications have a ldquoshaperdquo and a ldquoconfigurationrdquo bull The configuration definition describes the shape of a service

bull Role types

bull Role VM sizes

bull External and internal endpoints

bull Local storage

bull The configuration settings configures a service bull Instance count

bull Storage keys

bull Application-specific settings

Key Components Fabric Controller

bull Manages ldquonodesrdquo and ldquoedgesrdquo in the ldquofabricrdquo (the hardware) bull Power-on automation devices bull Routers Switches bull Hardware load balancers bull Physical servers bull Virtual servers

bull State transitions bull Current State bull Goal State bull Does what is needed to reach and maintain the goal state

bull Itrsquos a perfect IT employee bull Never sleeps bull Doesnrsquot ever ask for raise bull Always does what you tell it to do in configuration definition and settings

Creating a New Project

Windows Azure Compute

Key Components ndash Compute Web Roles

Web Front End

bull Cloud web server

bull Web pages

bull Web services

You can create the following types

bull ASPNET web roles

bull ASPNET MVC 2 web roles

bull WCF service web roles

bull Worker roles

bull CGI-based web roles

Key Components ndash Compute Worker Roles

bull Utility compute

bull Windows Server 2008

bull Background processing

bull Each role can define an amount of local storage

bull Protected space on the local drive considered volatile storage

bull May communicate with outside services

bull Azure Storage

bull SQL Azure

bull Other Web services

bull Can expose external and internal endpoints

Suggested Application Model Using queues for reliable messaging

Scalable Fault Tolerant Applications

Queues are the application glue bull Decouple parts of application easier to scale independently bull Resource allocation different priority queues and backend servers bull Mask faults in worker roles (reliable messaging)

Key Components ndash Compute VM Roles

bull Customized Role

bull You own the box

bull How it works

bull Download ldquoGuest OSrdquo to Server 2008 Hyper-V

bull Customize the OS as you need to

bull Upload the differences VHD

bull Azure runs your VM role using

bull Base OS

bull Differences VHD

Application Hosting

lsquoGrokkingrsquo the service model

bull Imagine white-boarding out your service architecture with boxes for nodes and arrows describing how they communicate

bull The service model is the same diagram written down in a declarative format

bull You give the Fabric the service model and the binaries that go with each of those nodes

bull The Fabric can provision deploy and manage that diagram for you

bull Find hardware home

bull Copy and launch your app binaries

bull Monitor your app and the hardware

bull In case of failure take action Perhaps even relocate your app

bull At all times the lsquodiagramrsquo stays whole

Automated Service Management Provide code + service model

bull Platform identifies and allocates resources deploys the service manages service health

bull Configuration is handled by two files

ServiceDefinitioncsdef

ServiceConfigurationcscfg

Service Definition

Service Configuration

GUI

Double click on Role Name in Azure Project

Deploying to the cloud

bull We can deploy from the portal or from script

bull VS builds two files

bull Encrypted package of your code

bull Your config file

bull You must create an Azure account then a service and then you deploy your code

bull Can take up to 20 minutes

bull (which is better than six months)

Service Management API

bullREST based API to manage your services

bullX509-certs for authentication

bullLets you create delete change upgrade swaphellip

bullLots of community and MSFT-built tools around the API

- Easy to roll your own

The Secret Sauce ndash The Fabric

The Fabric is the lsquobrainrsquo behind Windows Azure

1 Process service model

1 Determine resource requirements

2 Create role images

2 Allocate resources

3 Prepare nodes

1 Place role images on nodes

2 Configure settings

3 Start roles

4 Configure load balancers

5 Maintain service health

1 If role fails restart the role based on policy

2 If node fails migrate the role based on policy

Storage

Durable Storage At Massive Scale

Blob

- Massive files eg videos logs

Drive

- Use standard file system APIs

Tables - Non-relational but with few scale limits

- Use SQL Azure for relational data

Queues

- Facilitate loosely-coupled reliable systems

Blob Features and Functions

bull Store Large Objects (up to 1TB in size)

bull Can be served through Windows Azure CDN service

bull Standard REST Interface

bull PutBlob

bull Inserts a new blob overwrites the existing blob

bull GetBlob

bull Get whole blob or a specific range

bull DeleteBlob

bull CopyBlob

bull SnapshotBlob

bull LeaseBlob

Two Types of Blobs Under the Hood

bull Block Blob

bull Targeted at streaming workloads

bull Each blob consists of a sequence of blocks bull Each block is identified by a Block ID

bull Size limit 200GB per blob

bull Page Blob

bull Targeted at random readwrite workloads

bull Each blob consists of an array of pages bull Each page is identified by its offset

from the start of the blob

bull Size limit 1TB per blob

Windows Azure Drive

bull Provides a durable NTFS volume for Windows Azure applications to use

bull Use existing NTFS APIs to access a durable drive bull Durability and survival of data on application failover

bull Enables migrating existing NTFS applications to the cloud

bull A Windows Azure Drive is a Page Blob

bull Example mount Page Blob as X bull httpltaccountnamegtblobcorewindowsnetltcontainernamegtltblobnamegt

bull All writes to drive are made durable to the Page Blob bull Drive made durable through standard Page Blob replication

bull Drive persists even when not mounted as a Page Blob

Windows Azure Tables

bull Provides Structured Storage

bull Massively Scalable Tables bull Billions of entities (rows) and TBs of data

bull Can use thousands of servers as traffic grows

bull Highly Available amp Durable bull Data is replicated several times

bull Familiar and Easy to use API

bull WCF Data Services and OData bull NET classes and LINQ

bull REST ndash with any platform or language

Windows Azure Queues

bull Queue are performance efficient highly available and provide reliable message delivery

bull Simple asynchronous work dispatch

bull Programming semantics ensure that a message can be processed at least once

bull Access is provided via REST

Storage Partitioning

Understanding partitioning is key to understanding performance

bull Different for each data type (blobs entities queues) Every data object has a partition key

bull A partition can be served by a single server

bull System load balances partitions based on traffic pattern

bull Controls entity locality Partition key is unit of scale

bull Load balancing can take a few minutes to kick in

bull Can take a couple of seconds for partition to be available on a different server

System load balances

bull Use exponential backoff on ldquoServer Busyrdquo

bull Our system load balances to meet your traffic needs

bull Single partition limits have been reached Server Busy

Partition Keys In Each Abstraction

bull Entities w same PartitionKey value served from same partition Entities ndash TableName + PartitionKey

PartitionKey (CustomerId) RowKey (RowKind)

Name CreditCardNumber OrderTotal

1 Customer-John Smith John Smith xxxx-xxxx-xxxx-xxxx

1 Order ndash 1 $3512

2 Customer-Bill Johnson Bill Johnson xxxx-xxxx-xxxx-xxxx

2 Order ndash 3 $1000

bull Every blob and its snapshots are in a single partition Blobs ndash Container name + Blob name

bullAll messages for a single queue belong to the same partition Messages ndash Queue Name

Container Name Blob Name

image annarborbighousejpg

image foxboroughgillettejpg

video annarborbighousejpg

Queue Message

jobs Message1

jobs Message2

workflow Message1

Scalability Targets

Storage Account

bull Capacity ndash Up to 100 TBs

bull Transactions ndash Up to a few thousand requests per second

bull Bandwidth ndash Up to a few hundred megabytes per second

Single QueueTable Partition

bull Up to 500 transactions per second

To go above these numbers partition between multiple storage accounts and partitions

When limit is hit app will see lsquo503 server busyrsquo applications should implement exponential backoff

Single Blob Partition

bull Throughput up to 60 MBs

PartitionKey (Category)

RowKey (Title)

Timestamp ReleaseDate

Action Fast amp Furious hellip 2009

Action The Bourne Ultimatum hellip 2007

hellip hellip hellip hellip

Animation Open Season 2 hellip 2009

Animation The Ant Bully hellip 2006

PartitionKey (Category)

RowKey (Title)

Timestamp ReleaseDate

Comedy Office Space hellip 1999

hellip hellip hellip hellip

SciFi X-Men Origins Wolverine hellip 2009

hellip hellip hellip hellip

War Defiance hellip 2008

PartitionKey (Category)

RowKey (Title)

Timestamp ReleaseDate

Action Fast amp Furious hellip 2009

Action The Bourne Ultimatum hellip 2007

hellip hellip hellip hellip

Animation Open Season 2 hellip 2009

Animation The Ant Bully hellip 2006

hellip hellip hellip hellip

Comedy Office Space hellip 1999

hellip hellip hellip hellip

SciFi X-Men Origins Wolverine hellip 2009

hellip hellip hellip hellip

War Defiance hellip 2008

Partitions and Partition Ranges

Key Selection Things to Consider

bullDistribute load as much as possible bullHot partitions can be load balanced bullPartitionKey is critical for scalability

See httpwwwmicrosoftpdccom2009SVC09 and httpazurescopecloudappnet for more information

bull Avoid frequent large scans bull Parallelize queries bull Point queries are most efficient

bullTransactions across a single partition bullTransaction semantics amp Reduce round trips

Scalability

Query Efficiency amp Speed

Entity group transactions

Expect Continuation Tokens ndash Seriously

Maximum of 1000 rows in a response

At the end of partition range boundary

Maximum of 1000 rows in a response

At the end of partition range boundary

Maximum of 5 seconds to execute the query

Tables Recap bullEfficient for frequently used queries

bullSupports batch transactions

bullDistributes load

Select PartitionKey and RowKey that help scale

Avoid ldquoAppend onlyrdquo patterns

Always Handle continuation tokens

ldquoORrdquo predicates are not optimized

Implement back-off strategy for retries

bullDistribute by using a hash etc as prefix

bullExpect continuation tokens for range queries

bullExecute the queries that form the ldquoORrdquo predicates as separate queries

bullServer busy

bullLoad balance partitions to meet traffic needs

bullLoad on single partition has exceeded the limits

WCF Data Services

bullUse a new context for each logical operation

bullAddObjectAttachTo can throw exception if entity is already being tracked

bullPoint query throws an exception if resource does not exist Use IgnoreResourceNotFoundException

Queues Their Unique Role in Building Reliable Scalable Applications

bull Want roles that work closely together but are not bound together bull Tight coupling leads to brittleness

bull This can aid in scaling and performance

bull A queue can hold an unlimited number of messages bull Messages must be serializable as XML

bull Limited to 8KB in size

bull Commonly use the work ticket pattern

bull Why not simply use a table

Queue Terminology

Message Lifecycle

Queue

Msg 1

Msg 2

Msg 3

Msg 4

Worker Role

Worker Role

PutMessage

Web Role

GetMessage (Timeout) RemoveMessage

Msg 2 Msg 1

Worker Role

Msg 2

POST httpmyaccountqueuecorewindowsnetmyqueuemessages

HTTP11 200 OK Transfer-Encoding chunked Content-Type applicationxml Date Tue 09 Dec 2008 210430 GMT Server Nephos Queue Service Version 10 Microsoft-HTTPAPI20

ltxml version=10 encoding=utf-8gt ltQueueMessagesListgt ltQueueMessagegt ltMessageIdgt5974b586-0df3-4e2d-ad0c-18e3892bfca2ltMessageIdgt ltInsertionTimegtMon 22 Sep 2008 232920 GMTltInsertionTimegt ltExpirationTimegtMon 29 Sep 2008 232920 GMTltExpirationTimegt ltPopReceiptgtYzQ4Yzg1MDIGM0MDFiZDAwYzEwltPopReceiptgt ltTimeNextVisiblegtTue 23 Sep 2008 052920GMTltTimeNextVisiblegt ltMessageTextgtPHRlc3Q+dGdGVzdD4=ltMessageTextgt ltQueueMessagegt ltQueueMessagesListgt

DELETE httpmyaccountqueuecorewindowsnetmyqueuemessagesmessageidpopreceipt=YzQ4Yzg1MDIGM0MDFiZDAwYzEw

Truncated Exponential Back Off Polling

Consider a backoff polling approach Each empty poll

increases interval by 2x

A successful sets the interval back to 1

44

2 1

1 1

C1

C2

Removing Poison Messages

1 1

2 1

3 4 0

Producers Consumers

P2

P1

3 0

2 GetMessage(Q 30 s) msg 2

1 GetMessage(Q 30 s) msg 1

1 1

2 1

1 0

2 0

45

C1

C2

Removing Poison Messages

3 4 0

Producers Consumers

P2

P1

1 1

2 1

2 GetMessage(Q 30 s) msg 2 3 C2 consumed msg 2 4 DeleteMessage(Q msg 2) 7 GetMessage(Q 30 s) msg 1

1 GetMessage(Q 30 s) msg 1 5 C1 crashed

1 1

2 1

6 msg1 visible 30 s after Dequeue 3 0

1 2

1 1

1 2

46

C1

C2

Removing Poison Messages

3 4 0

Producers Consumers

P2

P1

1 2

2 Dequeue(Q 30 sec) msg 2 3 C2 consumed msg 2 4 Delete(Q msg 2) 7 Dequeue(Q 30 sec) msg 1 8 C2 crashed

1 Dequeue(Q 30 sec) msg 1 5 C1 crashed 10 C1 restarted 11 Dequeue(Q 30 sec) msg 1 12 DequeueCount gt 2 13 Delete (Q msg1) 1

2

6 msg1 visible 30s after Dequeue 9 msg1 visible 30s after Dequeue

3 0

1 3

1 2

1 3

Queues Recap

bullNo need to deal with failures Make message

processing idempotent

bullInvisible messages result in out of order Do not rely on order

bullEnforce threshold on messagersquos dequeue count Use Dequeue count to remove

poison messages

bullMessages gt 8KB

bullBatch messages

bullGarbage collect orphaned blobs

bullDynamically increasereduce workers

Use blob to store message data with

reference in message

Use message count to scale

bullNo need to deal with failures

bullInvisible messages result in out of order

bullEnforce threshold on messagersquos dequeue count

bullDynamically increasereduce workers

Windows Azure Storage Takeaways

Blobs

Drives

Tables

Queues

httpblogsmsdncomwindowsazurestorage

httpazurescopecloudappnet

49

A Quick Exercise

hellipThen letrsquos look at some code and some tools

50

Code ndash AccountInformationcs public class AccountInformation private static string storageKey = ldquotHiSiSnOtMyKeY private static string accountName = jjstore private static StorageCredentialsAccountAndKey credentials internal static StorageCredentialsAccountAndKey Credentials get if (credentials == null) credentials = new StorageCredentialsAccountAndKey(accountName storageKey) return credentials

51

Code ndash BlobHelpercs public class BlobHelper private static string defaultContainerName = school private CloudBlobClient client = null private CloudBlobContainer container = null private void InitContainer() if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudBlobClient() container = clientGetContainerReference(defaultContainerName) containerCreateIfNotExist() BlobContainerPermissions permissions = containerGetPermissions() permissionsPublicAccess = BlobContainerPublicAccessTypeContainer containerSetPermissions(permissions)

52

Code ndash BlobHelpercs

public void WriteFileToBlob(string filePath) if (client == null || container == null) InitContainer() FileInfo file = new FileInfo(filePath) CloudBlob blob = containerGetBlobReference(fileName) blobPropertiesContentType = GetContentType(fileExtension) blobUploadFile(fileFullName) Or if you want to write a string replace the last line with blobUploadText(someString) And make sure you set the content type to the appropriate MIME type (eg ldquotextplainrdquo)

53

Code ndash BlobHelpercs

public string GetBlobText(string blobName) if (client == null || container == null) InitContainer() CloudBlob blob = containerGetBlobReference(blobName) try return blobDownloadText() catch (Exception) The blob probably does not exist or there is no connection available return null

54

Application Code - Blobs private void SaveToCloudButton_Click(object sender RoutedEventArgs e) StringBuilder buff = new StringBuilder() buffAppendLine(LastNameFirstNameEmailBirthdayNativeLanguageFavoriteIceCreamYearsInPhDGraduated) foreach (AttendeeEntity attendee in attendees) buffAppendLine(attendeeToCsvString()) blobHelperWriteStringToBlob(SummerSchoolAttendeestxt buffToString())

The blob is now available at httpltAccountNamegtblobcorewindowsnetltContainerNamegtltBlobNamegt Or in this case httpjjstoreblobcorewindowsnetschoolSummerSchoolAttendeestxt

55

Code - TableEntities using MicrosoftWindowsAzureStorageClient public class AttendeeEntity TableServiceEntity public string FirstName get set public string LastName get set public string Email get set public DateTime Birthday get set public string FavoriteIceCream get set public int YearsInPhD get set public bool Graduated get set hellip

56

Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

57

Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

58

Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

59

Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

60

Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

61

Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

62

Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

63

Tools ndash Fiddler2

Best Practices

Picking the Right VM Size

bull Having the correct VM size can make a big difference in costs

bull Fundamental choice ndash larger fewer VMs vs many smaller instances

bull If you scale better than linear across cores larger VMs could save you money

bull Pretty rare to see linear scaling across 8 cores

bull More instances may provide better uptime and reliability (more failures needed to take your service down)

bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

Using Your VM to the Maximum

Remember bull 1 role instance == 1 VM running Windows

bull 1 role instance = one specific task for your code

bull Yoursquore paying for the entire VM so why not use it

bull Common mistake ndash split up code into multiple roles each not using up CPU

bull Balance between using up CPU vs having free capacity in times of need

bull Multiple ways to use your CPU to the fullest

Exploiting Concurrency

bull Spin up additional processes each with a specific task or as a unit of concurrency

bull May not be ideal if number of active processes exceeds number of cores

bull Use multithreading aggressively

bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

bull In NET 4 use the Task Parallel Library

bull Data parallelism

bull Task parallelism

Finding Good Code Neighbors

bull Typically code falls into one or more of these categories

bull Find code that is intensive with different resources to live together

bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

Memory Intensive

CPU Intensive

Network IO Intensive

Storage IO Intensive

Scaling Appropriately

bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

bull Spinning VMs up and down automatically is good at large scale

bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

bull Being too aggressive in spinning down VMs can result in poor user experience

bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

Performance Cost

Storage Costs

bull Understand an applicationrsquos storage profile and how storage billing works

bull Make service choices based on your app profile

bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

bull Service choice can make a big cost difference based on your app profile

bull Caching and compressing They help a lot with storage costs

Saving Bandwidth Costs

Bandwidth costs are a huge part of any popular web apprsquos billing profile

Sending fewer things over the wire often means getting fewer things from storage

Saving bandwidth costs often lead to savings in other places

Sending fewer things means your VM has time to do other tasks

All of these tips have the side benefit of improving your web apprsquos performance and user experience

Compressing Content

1 Gzip all output content

bull All modern browsers can decompress on the fly

bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

2Tradeoff compute costs for storage size

3Minimize image sizes

bull Use Portable Network Graphics (PNGs)

bull Crush your PNGs

bull Strip needless metadata

bull Make all PNGs palette PNGs

Uncompressed

Content

Compressed

Content

Gzip

Minify JavaScript

Minify CCS

Minify Images

Best Practices Summary

Doing lsquolessrsquo is the key to saving costs

Measure everything

Know your application profile in and out

Research Examples in the Cloud

hellipon another set of slides

Map Reduce on Azure

bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

76

Project Daytona - Map Reduce on Azure

httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

77

Questions and Discussionhellip

Thank you for hosting me at the Summer School

BLAST (Basic Local Alignment Search Tool)

bull The most important software in bioinformatics

bull Identify similarity between bio-sequences

Computationally intensive

bull Large number of pairwise alignment operations

bull A BLAST running can take 700 ~ 1000 CPU hours

bull Sequence databases growing exponentially

bull GenBank doubled in size in about 15 months

It is easy to parallelize BLAST

bull Segment the input

bull Segment processing (querying) is pleasingly parallel

bull Segment the database (eg mpiBLAST)

bull Needs special result reduction processing

Large volume data

bull A normal Blast database can be as large as 10GB

bull 100 nodes means the peak storage bandwidth could reach to 1TB

bull The output of BLAST is usually 10-100x larger than the input

bull Parallel BLAST engine on Azure

bull Query-segmentation data-parallel pattern bull split the input sequences

bull query partitions in parallel

bull merge results together when done

bull Follows the general suggested application model bull Web Role + Queue + Worker

bull With three special considerations bull Batch job management

bull Task parallelism on an elastic Cloud

Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

A simple SplitJoin pattern

Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

bull 1248 for small middle large and extra large instance size

Task granularity bull Large partition load imbalance

bull Small partition unnecessary overheads bull NCBI-BLAST overhead

bull Data transferring overhead

Best Practice test runs to profiling and set size to mitigate the overhead

Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

bull too small repeated computation

bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

bull Estimate the value based on the number of pair-bases in the partition and test-runs

bull Watch out for the 2-hour maximum limitation

BLAST task

Splitting task

BLAST task

BLAST task

BLAST task

hellip

Merging Task

Task size vs Performance

bull Benefit of the warm cache effect

bull 100 sequences per partition is the best choice

Instance size vs Performance

bull Super-linear speedup with larger size worker instances

bull Primarily due to the memory capability

Task SizeInstance Size vs Cost

bull Extra-large instance generated the best and the most economical throughput

bull Fully utilize the resource

Web

Portal

Web

Service

Job registration

Job Scheduler

Worker

Worker

Worker

Global

dispatch

queue

Web Role

Azure Table

Job Management Role

Azure Blob

Database

updating Role

hellip

Scaling Engine

Blast databases

temporary data etc)

Job Registry NCBI databases

BLAST task

Splitting task

BLAST task

BLAST task

BLAST task

hellip

Merging Task

ASPNET program hosted by a web role instance bull Submit jobs

bull Track jobrsquos status and logs

AuthenticationAuthorization based on Live ID

The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

Web Portal

Web Service

Job registration

Job Scheduler

Job Portal

Scaling Engine

Job Registry

R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

bull Against ~5000 proteins from another strain completed in less than 30 sec

AzureBLAST significantly saved computing timehellip

Discovering Homologs bull Discover the interrelationships of known protein sequences

ldquoAll against Allrdquo query bull The database is also the input query

bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

bull Theoretically 100 billion sequence comparisons

Performance estimation bull Based on the sampling-running on one extra-large Azure instance

bull Would require 3216731 minutes (61 years) on one desktop

One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

bull Each segment consists of smaller partitions

bull When load imbalances redistribute the load manually

5

0

62 6

2 6

2 6

2 6

2 5

0 62

bull Total size of the output result is ~230GB

bull The number of total hits is 1764579487

bull Started at March 25th the last task completed on April 8th (10 days compute)

bull But based our estimates real working instance time should be 6~8 day

bull Look into log data to analyze what took placehellip

5

0

62 6

2 6

2 6

2 6

2 5

0 62

A normal log record should be

Otherwise something is wrong (eg task failed to complete)

3312010 614 RD00155D3611B0 Executing the task 251523

3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

3312010 625 RD00155D3611B0 Executing the task 251553

3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

3312010 644 RD00155D3611B0 Executing the task 251600

3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

3312010 822 RD00155D3611B0 Executing the task 251774

3312010 950 RD00155D3611B0 Executing the task 251895

3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

North Europe Data Center totally 34256 tasks processed

All 62 compute nodes lost tasks

and then came back in a group

This is an Update domain

~30 mins

~ 6 nodes in one group

35 Nodes experience blob

writing failure at same

time

West Europe Datacenter 30976 tasks are completed and job was killed

A reasonable guess the

Fault Domain is working

MODISAzure Computing Evapotranspiration (ET) in The Cloud

You never miss the water till the well has run dry Irish Proverb

ET = Water volume evapotranspired (m3 s-1 m-2)

Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

λv = Latent heat of vaporization (Jg)

Rn = Net radiation (W m-2)

cp = Specific heat capacity of air (J kg-1 K-1)

ρa = dry air density (kg m-3)

δq = vapor pressure deficit (Pa)

ga = Conductivity of air (inverse of ra) (m s-1)

gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

γ = Psychrometric constant (γ asymp 66 Pa K-1)

Estimating resistanceconductivity across a

catchment can be tricky

bull Lots of inputs big data reduction

bull Some of the inputs are not so simple

119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

(∆ + 120574 1 + 119892119886 119892119904 )120582120592

Penman-Monteith (1964)

Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

NASA MODIS

imagery source

archives

5 TB (600K files)

FLUXNET curated

sensor dataset

(30GB 960 files)

FLUXNET curated

field dataset

2 KB (1 file)

NCEPNCAR

~100MB

(4K files)

Vegetative

clumping

~5MB (1file)

Climate

classification

~1MB (1file)

20 US year = 1 global year

Data collection (map) stage

bull Downloads requested input tiles

from NASA ftp sites

bull Includes geospatial lookup for

non-sinusoidal tiles that will

contribute to a reprojected

sinusoidal tile

Reprojection (map) stage

bull Converts source tile(s) to

intermediate result sinusoidal tiles

bull Simple nearest neighbor or spline

algorithms

Derivation reduction stage

bull First stage visible to scientist

bull Computes ET in our initial use

Analysis reduction stage

bull Optional second stage visible to

scientist

bull Enables production of science

analysis artifacts such as maps

tables virtual sensors

Reduction 1

Queue

Source

Metadata

AzureMODIS

Service Web Role Portal

Request

Queue

Scientific

Results

Download

Data Collection Stage

Source Imagery Download Sites

Reprojection

Queue

Reduction 2

Queue

Download

Queue

Scientists

Science results

Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

bull ModisAzure Service is the Web Role front door bull Receives all user requests

bull Queues request to appropriate Download Reprojection or Reduction Job Queue

bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

recoverable units of work

bull Execution status of all jobs and tasks persisted in Tables

ltPipelineStagegt

Request

hellip ltPipelineStagegtJobStatus

Persist ltPipelineStagegtJob Queue

MODISAzure Service

(Web Role)

Service Monitor

(Worker Role)

Parse amp Persist ltPipelineStagegtTaskStatus

hellip

Dispatch

ltPipelineStagegtTask Queue

All work actually done by a Worker Role

bull Sandboxes science or other executable

bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

Service Monitor

(Worker Role)

Parse amp Persist ltPipelineStagegtTaskStatus

GenericWorker

(Worker Role)

hellip

hellip

Dispatch

ltPipelineStagegtTask Queue

hellip

ltInputgtData Storage

bull Dequeues tasks created by the Service Monitor

bull Retries failed tasks 3 times

bull Maintains all task status

Reprojection Request

hellip

Service Monitor

(Worker Role)

ReprojectionJobStatus Persist

Parse amp Persist ReprojectionTaskStatus

GenericWorker

(Worker Role)

hellip

Job Queue

hellip

Dispatch

Task Queue

Points to

hellip

ScanTimeList

SwathGranuleMeta

Reprojection Data

Storage

Each entity specifies a

single reprojection job

request

Each entity specifies a

single reprojection task (ie

a single tile)

Query this table to get

geo-metadata (eg

boundaries) for each swath

tile

Query this table to get the

list of satellite scan times

that cover a target tile

Swath Source

Data Storage

bull Computational costs driven by data scale and need to run reduction multiple times

bull Storage costs driven by data scale and 6 month project duration

bull Small with respect to the people costs even at graduate student rates

Reduction 1 Queue

Source Metadata

Request Queue

Scientific Results Download

Data Collection Stage

Source Imagery Download Sites

Reprojection Queue

Reduction 2 Queue

Download

Queue

Scientists

Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

400-500 GB

60K files

10 MBsec

11 hours

lt10 workers

$50 upload

$450 storage

400 GB

45K files

3500 hours

20-100

workers

5-7 GB

55K files

1800 hours

20-100

workers

lt10 GB

~1K files

1800 hours

20-100

workers

$420 cpu

$60 download

$216 cpu

$1 download

$6 storage

$216 cpu

$2 download

$9 storage

AzureMODIS

Service Web Role Portal

Total $1420

bull Clouds are the largest scale computer centers ever constructed and have the

potential to be important to both large and small scale science problems

bull Equally import they can increase participation in research providing needed

resources to userscommunities without ready access

bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

latency applications do not perform optimally on clouds today

bull Provide valuable fault tolerance and scalability abstractions

bull Clouds as amplifier for familiar client tools and on premise compute

bull Clouds services to support research provide considerable leverage for both

individual researchers and entire communities of researchers

  • Day 2 - Azure as a PaaS
  • Day 2 - Applications

    Before we begin ndash Some Results

    Vanilla 23

    Chocolate 13

    Strawberry 10

    Coffee 3

    Banana 3

    Pistachio 7

    Mango 3

    Amarena 3

    Malaga 3

    Cherry 3

    Tiramisu 3

    Stratiatella 10

    Cheescake 3

    Cookies and Cream

    3

    Walnut 3

    Cinamon 3

    Favorite Ice Cream

    Vanilla 33

    Chocolate 11

    Strawberry 5

    Coffee 2

    Cherry 2

    Cookies and Cream 4

    Butter Pecan 7

    Neapolitan 4

    Chocolate Chip 4

    Other 29

    Ice Cream Consumption

    Source International Ice Cream Association (makeicecreamcom)

    Windows Azure Overview

    4

    Web Application Model Comparison

    Machines Running IIS ASPNET

    Machines Running Windows Services

    Machines Running SQL Server

    Ad Hoc Application Model

    5

    Web Application Model Comparison

    Machines Running IIS ASPNET

    Machines Running Windows Services

    Machines Running SQL Server

    Ad Hoc Application Model

    Web Role Instances Worker Role

    Instances

    Azure Storage Blob Queue Table

    SQL Azure

    Windows Azure Application Model

    Key Components Fabric Controller

    bull Manages hardware and virtual machines for service

    Compute bull Web Roles

    bull Web application front end

    bull Worker Roles bull Utility compute

    bull VM Roles bull Custom compute role bull You own and customize the VM

    Storage bull Blobs

    bull Binary objects

    bull Tables bull Entity storage

    bull Queues bull Role coordination

    bull SQL Azure bull SQL in the cloud

    Key Components Fabric Controller

    bull Think of it as an automated IT department

    bull ldquoCloud Layerrdquo on top of

    bull Windows Server 2008

    bull A custom version of Hyper-V called the Windows Azure Hypervisor

    bull Allows for automated management of virtual machines

    Key Components Fabric Controller

    bull Think of it as an automated IT department bull ldquoCloud Layerrdquo on top of

    bull Windows Server 2008

    bull A custom version of Hyper-V called the Windows Azure Hypervisor bull Allows for automated management of virtual machines

    bull Itrsquos job is to provision deploy monitor and maintain applications in data centers

    bull Applications have a ldquoshaperdquo and a ldquoconfigurationrdquo bull The configuration definition describes the shape of a service

    bull Role types

    bull Role VM sizes

    bull External and internal endpoints

    bull Local storage

    bull The configuration settings configures a service bull Instance count

    bull Storage keys

    bull Application-specific settings

    Key Components Fabric Controller

    bull Manages ldquonodesrdquo and ldquoedgesrdquo in the ldquofabricrdquo (the hardware) bull Power-on automation devices bull Routers Switches bull Hardware load balancers bull Physical servers bull Virtual servers

    bull State transitions bull Current State bull Goal State bull Does what is needed to reach and maintain the goal state

    bull Itrsquos a perfect IT employee bull Never sleeps bull Doesnrsquot ever ask for raise bull Always does what you tell it to do in configuration definition and settings

    Creating a New Project

    Windows Azure Compute

    Key Components ndash Compute Web Roles

    Web Front End

    bull Cloud web server

    bull Web pages

    bull Web services

    You can create the following types

    bull ASPNET web roles

    bull ASPNET MVC 2 web roles

    bull WCF service web roles

    bull Worker roles

    bull CGI-based web roles

    Key Components ndash Compute Worker Roles

    bull Utility compute

    bull Windows Server 2008

    bull Background processing

    bull Each role can define an amount of local storage

    bull Protected space on the local drive considered volatile storage

    bull May communicate with outside services

    bull Azure Storage

    bull SQL Azure

    bull Other Web services

    bull Can expose external and internal endpoints

    Suggested Application Model Using queues for reliable messaging

    Scalable Fault Tolerant Applications

    Queues are the application glue bull Decouple parts of application easier to scale independently bull Resource allocation different priority queues and backend servers bull Mask faults in worker roles (reliable messaging)

    Key Components ndash Compute VM Roles

    bull Customized Role

    bull You own the box

    bull How it works

    bull Download ldquoGuest OSrdquo to Server 2008 Hyper-V

    bull Customize the OS as you need to

    bull Upload the differences VHD

    bull Azure runs your VM role using

    bull Base OS

    bull Differences VHD

    Application Hosting

    lsquoGrokkingrsquo the service model

    bull Imagine white-boarding out your service architecture with boxes for nodes and arrows describing how they communicate

    bull The service model is the same diagram written down in a declarative format

    bull You give the Fabric the service model and the binaries that go with each of those nodes

    bull The Fabric can provision deploy and manage that diagram for you

    bull Find hardware home

    bull Copy and launch your app binaries

    bull Monitor your app and the hardware

    bull In case of failure take action Perhaps even relocate your app

    bull At all times the lsquodiagramrsquo stays whole

    Automated Service Management Provide code + service model

    bull Platform identifies and allocates resources deploys the service manages service health

    bull Configuration is handled by two files

    ServiceDefinitioncsdef

    ServiceConfigurationcscfg

    Service Definition

    Service Configuration

    GUI

    Double click on Role Name in Azure Project

    Deploying to the cloud

    bull We can deploy from the portal or from script

    bull VS builds two files

    bull Encrypted package of your code

    bull Your config file

    bull You must create an Azure account then a service and then you deploy your code

    bull Can take up to 20 minutes

    bull (which is better than six months)

    Service Management API

    bullREST based API to manage your services

    bullX509-certs for authentication

    bullLets you create delete change upgrade swaphellip

    bullLots of community and MSFT-built tools around the API

    - Easy to roll your own

    The Secret Sauce ndash The Fabric

    The Fabric is the lsquobrainrsquo behind Windows Azure

    1 Process service model

    1 Determine resource requirements

    2 Create role images

    2 Allocate resources

    3 Prepare nodes

    1 Place role images on nodes

    2 Configure settings

    3 Start roles

    4 Configure load balancers

    5 Maintain service health

    1 If role fails restart the role based on policy

    2 If node fails migrate the role based on policy

    Storage

    Durable Storage At Massive Scale

    Blob

    - Massive files eg videos logs

    Drive

    - Use standard file system APIs

    Tables - Non-relational but with few scale limits

    - Use SQL Azure for relational data

    Queues

    - Facilitate loosely-coupled reliable systems

    Blob Features and Functions

    bull Store Large Objects (up to 1TB in size)

    bull Can be served through Windows Azure CDN service

    bull Standard REST Interface

    bull PutBlob

    bull Inserts a new blob overwrites the existing blob

    bull GetBlob

    bull Get whole blob or a specific range

    bull DeleteBlob

    bull CopyBlob

    bull SnapshotBlob

    bull LeaseBlob

    Two Types of Blobs Under the Hood

    bull Block Blob

    bull Targeted at streaming workloads

    bull Each blob consists of a sequence of blocks bull Each block is identified by a Block ID

    bull Size limit 200GB per blob

    bull Page Blob

    bull Targeted at random readwrite workloads

    bull Each blob consists of an array of pages bull Each page is identified by its offset

    from the start of the blob

    bull Size limit 1TB per blob

    Windows Azure Drive

    bull Provides a durable NTFS volume for Windows Azure applications to use

    bull Use existing NTFS APIs to access a durable drive bull Durability and survival of data on application failover

    bull Enables migrating existing NTFS applications to the cloud

    bull A Windows Azure Drive is a Page Blob

    bull Example mount Page Blob as X bull httpltaccountnamegtblobcorewindowsnetltcontainernamegtltblobnamegt

    bull All writes to drive are made durable to the Page Blob bull Drive made durable through standard Page Blob replication

    bull Drive persists even when not mounted as a Page Blob

    Windows Azure Tables

    bull Provides Structured Storage

    bull Massively Scalable Tables bull Billions of entities (rows) and TBs of data

    bull Can use thousands of servers as traffic grows

    bull Highly Available amp Durable bull Data is replicated several times

    bull Familiar and Easy to use API

    bull WCF Data Services and OData bull NET classes and LINQ

    bull REST ndash with any platform or language

    Windows Azure Queues

    bull Queue are performance efficient highly available and provide reliable message delivery

    bull Simple asynchronous work dispatch

    bull Programming semantics ensure that a message can be processed at least once

    bull Access is provided via REST

    Storage Partitioning

    Understanding partitioning is key to understanding performance

    bull Different for each data type (blobs entities queues) Every data object has a partition key

    bull A partition can be served by a single server

    bull System load balances partitions based on traffic pattern

    bull Controls entity locality Partition key is unit of scale

    bull Load balancing can take a few minutes to kick in

    bull Can take a couple of seconds for partition to be available on a different server

    System load balances

    bull Use exponential backoff on ldquoServer Busyrdquo

    bull Our system load balances to meet your traffic needs

    bull Single partition limits have been reached Server Busy

    Partition Keys In Each Abstraction

    bull Entities w same PartitionKey value served from same partition Entities ndash TableName + PartitionKey

    PartitionKey (CustomerId) RowKey (RowKind)

    Name CreditCardNumber OrderTotal

    1 Customer-John Smith John Smith xxxx-xxxx-xxxx-xxxx

    1 Order ndash 1 $3512

    2 Customer-Bill Johnson Bill Johnson xxxx-xxxx-xxxx-xxxx

    2 Order ndash 3 $1000

    bull Every blob and its snapshots are in a single partition Blobs ndash Container name + Blob name

    bullAll messages for a single queue belong to the same partition Messages ndash Queue Name

    Container Name Blob Name

    image annarborbighousejpg

    image foxboroughgillettejpg

    video annarborbighousejpg

    Queue Message

    jobs Message1

    jobs Message2

    workflow Message1

    Scalability Targets

    Storage Account

    bull Capacity ndash Up to 100 TBs

    bull Transactions ndash Up to a few thousand requests per second

    bull Bandwidth ndash Up to a few hundred megabytes per second

    Single QueueTable Partition

    bull Up to 500 transactions per second

    To go above these numbers partition between multiple storage accounts and partitions

    When limit is hit app will see lsquo503 server busyrsquo applications should implement exponential backoff

    Single Blob Partition

    bull Throughput up to 60 MBs

    PartitionKey (Category)

    RowKey (Title)

    Timestamp ReleaseDate

    Action Fast amp Furious hellip 2009

    Action The Bourne Ultimatum hellip 2007

    hellip hellip hellip hellip

    Animation Open Season 2 hellip 2009

    Animation The Ant Bully hellip 2006

    PartitionKey (Category)

    RowKey (Title)

    Timestamp ReleaseDate

    Comedy Office Space hellip 1999

    hellip hellip hellip hellip

    SciFi X-Men Origins Wolverine hellip 2009

    hellip hellip hellip hellip

    War Defiance hellip 2008

    PartitionKey (Category)

    RowKey (Title)

    Timestamp ReleaseDate

    Action Fast amp Furious hellip 2009

    Action The Bourne Ultimatum hellip 2007

    hellip hellip hellip hellip

    Animation Open Season 2 hellip 2009

    Animation The Ant Bully hellip 2006

    hellip hellip hellip hellip

    Comedy Office Space hellip 1999

    hellip hellip hellip hellip

    SciFi X-Men Origins Wolverine hellip 2009

    hellip hellip hellip hellip

    War Defiance hellip 2008

    Partitions and Partition Ranges

    Key Selection Things to Consider

    bullDistribute load as much as possible bullHot partitions can be load balanced bullPartitionKey is critical for scalability

    See httpwwwmicrosoftpdccom2009SVC09 and httpazurescopecloudappnet for more information

    bull Avoid frequent large scans bull Parallelize queries bull Point queries are most efficient

    bullTransactions across a single partition bullTransaction semantics amp Reduce round trips

    Scalability

    Query Efficiency amp Speed

    Entity group transactions

    Expect Continuation Tokens ndash Seriously

    Maximum of 1000 rows in a response

    At the end of partition range boundary

    Maximum of 1000 rows in a response

    At the end of partition range boundary

    Maximum of 5 seconds to execute the query

    Tables Recap bullEfficient for frequently used queries

    bullSupports batch transactions

    bullDistributes load

    Select PartitionKey and RowKey that help scale

    Avoid ldquoAppend onlyrdquo patterns

    Always Handle continuation tokens

    ldquoORrdquo predicates are not optimized

    Implement back-off strategy for retries

    bullDistribute by using a hash etc as prefix

    bullExpect continuation tokens for range queries

    bullExecute the queries that form the ldquoORrdquo predicates as separate queries

    bullServer busy

    bullLoad balance partitions to meet traffic needs

    bullLoad on single partition has exceeded the limits

    WCF Data Services

    bullUse a new context for each logical operation

    bullAddObjectAttachTo can throw exception if entity is already being tracked

    bullPoint query throws an exception if resource does not exist Use IgnoreResourceNotFoundException

    Queues Their Unique Role in Building Reliable Scalable Applications

    bull Want roles that work closely together but are not bound together bull Tight coupling leads to brittleness

    bull This can aid in scaling and performance

    bull A queue can hold an unlimited number of messages bull Messages must be serializable as XML

    bull Limited to 8KB in size

    bull Commonly use the work ticket pattern

    bull Why not simply use a table

    Queue Terminology

    Message Lifecycle

    Queue

    Msg 1

    Msg 2

    Msg 3

    Msg 4

    Worker Role

    Worker Role

    PutMessage

    Web Role

    GetMessage (Timeout) RemoveMessage

    Msg 2 Msg 1

    Worker Role

    Msg 2

    POST httpmyaccountqueuecorewindowsnetmyqueuemessages

    HTTP11 200 OK Transfer-Encoding chunked Content-Type applicationxml Date Tue 09 Dec 2008 210430 GMT Server Nephos Queue Service Version 10 Microsoft-HTTPAPI20

    ltxml version=10 encoding=utf-8gt ltQueueMessagesListgt ltQueueMessagegt ltMessageIdgt5974b586-0df3-4e2d-ad0c-18e3892bfca2ltMessageIdgt ltInsertionTimegtMon 22 Sep 2008 232920 GMTltInsertionTimegt ltExpirationTimegtMon 29 Sep 2008 232920 GMTltExpirationTimegt ltPopReceiptgtYzQ4Yzg1MDIGM0MDFiZDAwYzEwltPopReceiptgt ltTimeNextVisiblegtTue 23 Sep 2008 052920GMTltTimeNextVisiblegt ltMessageTextgtPHRlc3Q+dGdGVzdD4=ltMessageTextgt ltQueueMessagegt ltQueueMessagesListgt

    DELETE httpmyaccountqueuecorewindowsnetmyqueuemessagesmessageidpopreceipt=YzQ4Yzg1MDIGM0MDFiZDAwYzEw

    Truncated Exponential Back Off Polling

    Consider a backoff polling approach Each empty poll

    increases interval by 2x

    A successful sets the interval back to 1

    44

    2 1

    1 1

    C1

    C2

    Removing Poison Messages

    1 1

    2 1

    3 4 0

    Producers Consumers

    P2

    P1

    3 0

    2 GetMessage(Q 30 s) msg 2

    1 GetMessage(Q 30 s) msg 1

    1 1

    2 1

    1 0

    2 0

    45

    C1

    C2

    Removing Poison Messages

    3 4 0

    Producers Consumers

    P2

    P1

    1 1

    2 1

    2 GetMessage(Q 30 s) msg 2 3 C2 consumed msg 2 4 DeleteMessage(Q msg 2) 7 GetMessage(Q 30 s) msg 1

    1 GetMessage(Q 30 s) msg 1 5 C1 crashed

    1 1

    2 1

    6 msg1 visible 30 s after Dequeue 3 0

    1 2

    1 1

    1 2

    46

    C1

    C2

    Removing Poison Messages

    3 4 0

    Producers Consumers

    P2

    P1

    1 2

    2 Dequeue(Q 30 sec) msg 2 3 C2 consumed msg 2 4 Delete(Q msg 2) 7 Dequeue(Q 30 sec) msg 1 8 C2 crashed

    1 Dequeue(Q 30 sec) msg 1 5 C1 crashed 10 C1 restarted 11 Dequeue(Q 30 sec) msg 1 12 DequeueCount gt 2 13 Delete (Q msg1) 1

    2

    6 msg1 visible 30s after Dequeue 9 msg1 visible 30s after Dequeue

    3 0

    1 3

    1 2

    1 3

    Queues Recap

    bullNo need to deal with failures Make message

    processing idempotent

    bullInvisible messages result in out of order Do not rely on order

    bullEnforce threshold on messagersquos dequeue count Use Dequeue count to remove

    poison messages

    bullMessages gt 8KB

    bullBatch messages

    bullGarbage collect orphaned blobs

    bullDynamically increasereduce workers

    Use blob to store message data with

    reference in message

    Use message count to scale

    bullNo need to deal with failures

    bullInvisible messages result in out of order

    bullEnforce threshold on messagersquos dequeue count

    bullDynamically increasereduce workers

    Windows Azure Storage Takeaways

    Blobs

    Drives

    Tables

    Queues

    httpblogsmsdncomwindowsazurestorage

    httpazurescopecloudappnet

    49

    A Quick Exercise

    hellipThen letrsquos look at some code and some tools

    50

    Code ndash AccountInformationcs public class AccountInformation private static string storageKey = ldquotHiSiSnOtMyKeY private static string accountName = jjstore private static StorageCredentialsAccountAndKey credentials internal static StorageCredentialsAccountAndKey Credentials get if (credentials == null) credentials = new StorageCredentialsAccountAndKey(accountName storageKey) return credentials

    51

    Code ndash BlobHelpercs public class BlobHelper private static string defaultContainerName = school private CloudBlobClient client = null private CloudBlobContainer container = null private void InitContainer() if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudBlobClient() container = clientGetContainerReference(defaultContainerName) containerCreateIfNotExist() BlobContainerPermissions permissions = containerGetPermissions() permissionsPublicAccess = BlobContainerPublicAccessTypeContainer containerSetPermissions(permissions)

    52

    Code ndash BlobHelpercs

    public void WriteFileToBlob(string filePath) if (client == null || container == null) InitContainer() FileInfo file = new FileInfo(filePath) CloudBlob blob = containerGetBlobReference(fileName) blobPropertiesContentType = GetContentType(fileExtension) blobUploadFile(fileFullName) Or if you want to write a string replace the last line with blobUploadText(someString) And make sure you set the content type to the appropriate MIME type (eg ldquotextplainrdquo)

    53

    Code ndash BlobHelpercs

    public string GetBlobText(string blobName) if (client == null || container == null) InitContainer() CloudBlob blob = containerGetBlobReference(blobName) try return blobDownloadText() catch (Exception) The blob probably does not exist or there is no connection available return null

    54

    Application Code - Blobs private void SaveToCloudButton_Click(object sender RoutedEventArgs e) StringBuilder buff = new StringBuilder() buffAppendLine(LastNameFirstNameEmailBirthdayNativeLanguageFavoriteIceCreamYearsInPhDGraduated) foreach (AttendeeEntity attendee in attendees) buffAppendLine(attendeeToCsvString()) blobHelperWriteStringToBlob(SummerSchoolAttendeestxt buffToString())

    The blob is now available at httpltAccountNamegtblobcorewindowsnetltContainerNamegtltBlobNamegt Or in this case httpjjstoreblobcorewindowsnetschoolSummerSchoolAttendeestxt

    55

    Code - TableEntities using MicrosoftWindowsAzureStorageClient public class AttendeeEntity TableServiceEntity public string FirstName get set public string LastName get set public string Email get set public DateTime Birthday get set public string FavoriteIceCream get set public int YearsInPhD get set public bool Graduated get set hellip

    56

    Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

    57

    Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

    58

    Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

    59

    Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

    60

    Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

    Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

    61

    Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

    62

    Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

    Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

    63

    Tools ndash Fiddler2

    Best Practices

    Picking the Right VM Size

    bull Having the correct VM size can make a big difference in costs

    bull Fundamental choice ndash larger fewer VMs vs many smaller instances

    bull If you scale better than linear across cores larger VMs could save you money

    bull Pretty rare to see linear scaling across 8 cores

    bull More instances may provide better uptime and reliability (more failures needed to take your service down)

    bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

    Using Your VM to the Maximum

    Remember bull 1 role instance == 1 VM running Windows

    bull 1 role instance = one specific task for your code

    bull Yoursquore paying for the entire VM so why not use it

    bull Common mistake ndash split up code into multiple roles each not using up CPU

    bull Balance between using up CPU vs having free capacity in times of need

    bull Multiple ways to use your CPU to the fullest

    Exploiting Concurrency

    bull Spin up additional processes each with a specific task or as a unit of concurrency

    bull May not be ideal if number of active processes exceeds number of cores

    bull Use multithreading aggressively

    bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

    bull In NET 4 use the Task Parallel Library

    bull Data parallelism

    bull Task parallelism

    Finding Good Code Neighbors

    bull Typically code falls into one or more of these categories

    bull Find code that is intensive with different resources to live together

    bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

    Memory Intensive

    CPU Intensive

    Network IO Intensive

    Storage IO Intensive

    Scaling Appropriately

    bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

    bull Spinning VMs up and down automatically is good at large scale

    bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

    bull Being too aggressive in spinning down VMs can result in poor user experience

    bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

    Performance Cost

    Storage Costs

    bull Understand an applicationrsquos storage profile and how storage billing works

    bull Make service choices based on your app profile

    bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

    bull Service choice can make a big cost difference based on your app profile

    bull Caching and compressing They help a lot with storage costs

    Saving Bandwidth Costs

    Bandwidth costs are a huge part of any popular web apprsquos billing profile

    Sending fewer things over the wire often means getting fewer things from storage

    Saving bandwidth costs often lead to savings in other places

    Sending fewer things means your VM has time to do other tasks

    All of these tips have the side benefit of improving your web apprsquos performance and user experience

    Compressing Content

    1 Gzip all output content

    bull All modern browsers can decompress on the fly

    bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

    2Tradeoff compute costs for storage size

    3Minimize image sizes

    bull Use Portable Network Graphics (PNGs)

    bull Crush your PNGs

    bull Strip needless metadata

    bull Make all PNGs palette PNGs

    Uncompressed

    Content

    Compressed

    Content

    Gzip

    Minify JavaScript

    Minify CCS

    Minify Images

    Best Practices Summary

    Doing lsquolessrsquo is the key to saving costs

    Measure everything

    Know your application profile in and out

    Research Examples in the Cloud

    hellipon another set of slides

    Map Reduce on Azure

    bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

    bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

    76

    Project Daytona - Map Reduce on Azure

    httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

    77

    Questions and Discussionhellip

    Thank you for hosting me at the Summer School

    BLAST (Basic Local Alignment Search Tool)

    bull The most important software in bioinformatics

    bull Identify similarity between bio-sequences

    Computationally intensive

    bull Large number of pairwise alignment operations

    bull A BLAST running can take 700 ~ 1000 CPU hours

    bull Sequence databases growing exponentially

    bull GenBank doubled in size in about 15 months

    It is easy to parallelize BLAST

    bull Segment the input

    bull Segment processing (querying) is pleasingly parallel

    bull Segment the database (eg mpiBLAST)

    bull Needs special result reduction processing

    Large volume data

    bull A normal Blast database can be as large as 10GB

    bull 100 nodes means the peak storage bandwidth could reach to 1TB

    bull The output of BLAST is usually 10-100x larger than the input

    bull Parallel BLAST engine on Azure

    bull Query-segmentation data-parallel pattern bull split the input sequences

    bull query partitions in parallel

    bull merge results together when done

    bull Follows the general suggested application model bull Web Role + Queue + Worker

    bull With three special considerations bull Batch job management

    bull Task parallelism on an elastic Cloud

    Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

    on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

    A simple SplitJoin pattern

    Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

    bull 1248 for small middle large and extra large instance size

    Task granularity bull Large partition load imbalance

    bull Small partition unnecessary overheads bull NCBI-BLAST overhead

    bull Data transferring overhead

    Best Practice test runs to profiling and set size to mitigate the overhead

    Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

    bull too small repeated computation

    bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

    bull Estimate the value based on the number of pair-bases in the partition and test-runs

    bull Watch out for the 2-hour maximum limitation

    BLAST task

    Splitting task

    BLAST task

    BLAST task

    BLAST task

    hellip

    Merging Task

    Task size vs Performance

    bull Benefit of the warm cache effect

    bull 100 sequences per partition is the best choice

    Instance size vs Performance

    bull Super-linear speedup with larger size worker instances

    bull Primarily due to the memory capability

    Task SizeInstance Size vs Cost

    bull Extra-large instance generated the best and the most economical throughput

    bull Fully utilize the resource

    Web

    Portal

    Web

    Service

    Job registration

    Job Scheduler

    Worker

    Worker

    Worker

    Global

    dispatch

    queue

    Web Role

    Azure Table

    Job Management Role

    Azure Blob

    Database

    updating Role

    hellip

    Scaling Engine

    Blast databases

    temporary data etc)

    Job Registry NCBI databases

    BLAST task

    Splitting task

    BLAST task

    BLAST task

    BLAST task

    hellip

    Merging Task

    ASPNET program hosted by a web role instance bull Submit jobs

    bull Track jobrsquos status and logs

    AuthenticationAuthorization based on Live ID

    The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

    Web Portal

    Web Service

    Job registration

    Job Scheduler

    Job Portal

    Scaling Engine

    Job Registry

    R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

    Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

    bull Against ~5000 proteins from another strain completed in less than 30 sec

    AzureBLAST significantly saved computing timehellip

    Discovering Homologs bull Discover the interrelationships of known protein sequences

    ldquoAll against Allrdquo query bull The database is also the input query

    bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

    bull Theoretically 100 billion sequence comparisons

    Performance estimation bull Based on the sampling-running on one extra-large Azure instance

    bull Would require 3216731 minutes (61 years) on one desktop

    One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

    bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

    bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

    bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

    bull Each segment consists of smaller partitions

    bull When load imbalances redistribute the load manually

    5

    0

    62 6

    2 6

    2 6

    2 6

    2 5

    0 62

    bull Total size of the output result is ~230GB

    bull The number of total hits is 1764579487

    bull Started at March 25th the last task completed on April 8th (10 days compute)

    bull But based our estimates real working instance time should be 6~8 day

    bull Look into log data to analyze what took placehellip

    5

    0

    62 6

    2 6

    2 6

    2 6

    2 5

    0 62

    A normal log record should be

    Otherwise something is wrong (eg task failed to complete)

    3312010 614 RD00155D3611B0 Executing the task 251523

    3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

    3312010 625 RD00155D3611B0 Executing the task 251553

    3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

    3312010 644 RD00155D3611B0 Executing the task 251600

    3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

    3312010 822 RD00155D3611B0 Executing the task 251774

    3312010 950 RD00155D3611B0 Executing the task 251895

    3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

    North Europe Data Center totally 34256 tasks processed

    All 62 compute nodes lost tasks

    and then came back in a group

    This is an Update domain

    ~30 mins

    ~ 6 nodes in one group

    35 Nodes experience blob

    writing failure at same

    time

    West Europe Datacenter 30976 tasks are completed and job was killed

    A reasonable guess the

    Fault Domain is working

    MODISAzure Computing Evapotranspiration (ET) in The Cloud

    You never miss the water till the well has run dry Irish Proverb

    ET = Water volume evapotranspired (m3 s-1 m-2)

    Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

    λv = Latent heat of vaporization (Jg)

    Rn = Net radiation (W m-2)

    cp = Specific heat capacity of air (J kg-1 K-1)

    ρa = dry air density (kg m-3)

    δq = vapor pressure deficit (Pa)

    ga = Conductivity of air (inverse of ra) (m s-1)

    gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

    γ = Psychrometric constant (γ asymp 66 Pa K-1)

    Estimating resistanceconductivity across a

    catchment can be tricky

    bull Lots of inputs big data reduction

    bull Some of the inputs are not so simple

    119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

    (∆ + 120574 1 + 119892119886 119892119904 )120582120592

    Penman-Monteith (1964)

    Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

    NASA MODIS

    imagery source

    archives

    5 TB (600K files)

    FLUXNET curated

    sensor dataset

    (30GB 960 files)

    FLUXNET curated

    field dataset

    2 KB (1 file)

    NCEPNCAR

    ~100MB

    (4K files)

    Vegetative

    clumping

    ~5MB (1file)

    Climate

    classification

    ~1MB (1file)

    20 US year = 1 global year

    Data collection (map) stage

    bull Downloads requested input tiles

    from NASA ftp sites

    bull Includes geospatial lookup for

    non-sinusoidal tiles that will

    contribute to a reprojected

    sinusoidal tile

    Reprojection (map) stage

    bull Converts source tile(s) to

    intermediate result sinusoidal tiles

    bull Simple nearest neighbor or spline

    algorithms

    Derivation reduction stage

    bull First stage visible to scientist

    bull Computes ET in our initial use

    Analysis reduction stage

    bull Optional second stage visible to

    scientist

    bull Enables production of science

    analysis artifacts such as maps

    tables virtual sensors

    Reduction 1

    Queue

    Source

    Metadata

    AzureMODIS

    Service Web Role Portal

    Request

    Queue

    Scientific

    Results

    Download

    Data Collection Stage

    Source Imagery Download Sites

    Reprojection

    Queue

    Reduction 2

    Queue

    Download

    Queue

    Scientists

    Science results

    Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

    httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

    bull ModisAzure Service is the Web Role front door bull Receives all user requests

    bull Queues request to appropriate Download Reprojection or Reduction Job Queue

    bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

    recoverable units of work

    bull Execution status of all jobs and tasks persisted in Tables

    ltPipelineStagegt

    Request

    hellip ltPipelineStagegtJobStatus

    Persist ltPipelineStagegtJob Queue

    MODISAzure Service

    (Web Role)

    Service Monitor

    (Worker Role)

    Parse amp Persist ltPipelineStagegtTaskStatus

    hellip

    Dispatch

    ltPipelineStagegtTask Queue

    All work actually done by a Worker Role

    bull Sandboxes science or other executable

    bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

    Service Monitor

    (Worker Role)

    Parse amp Persist ltPipelineStagegtTaskStatus

    GenericWorker

    (Worker Role)

    hellip

    hellip

    Dispatch

    ltPipelineStagegtTask Queue

    hellip

    ltInputgtData Storage

    bull Dequeues tasks created by the Service Monitor

    bull Retries failed tasks 3 times

    bull Maintains all task status

    Reprojection Request

    hellip

    Service Monitor

    (Worker Role)

    ReprojectionJobStatus Persist

    Parse amp Persist ReprojectionTaskStatus

    GenericWorker

    (Worker Role)

    hellip

    Job Queue

    hellip

    Dispatch

    Task Queue

    Points to

    hellip

    ScanTimeList

    SwathGranuleMeta

    Reprojection Data

    Storage

    Each entity specifies a

    single reprojection job

    request

    Each entity specifies a

    single reprojection task (ie

    a single tile)

    Query this table to get

    geo-metadata (eg

    boundaries) for each swath

    tile

    Query this table to get the

    list of satellite scan times

    that cover a target tile

    Swath Source

    Data Storage

    bull Computational costs driven by data scale and need to run reduction multiple times

    bull Storage costs driven by data scale and 6 month project duration

    bull Small with respect to the people costs even at graduate student rates

    Reduction 1 Queue

    Source Metadata

    Request Queue

    Scientific Results Download

    Data Collection Stage

    Source Imagery Download Sites

    Reprojection Queue

    Reduction 2 Queue

    Download

    Queue

    Scientists

    Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

    400-500 GB

    60K files

    10 MBsec

    11 hours

    lt10 workers

    $50 upload

    $450 storage

    400 GB

    45K files

    3500 hours

    20-100

    workers

    5-7 GB

    55K files

    1800 hours

    20-100

    workers

    lt10 GB

    ~1K files

    1800 hours

    20-100

    workers

    $420 cpu

    $60 download

    $216 cpu

    $1 download

    $6 storage

    $216 cpu

    $2 download

    $9 storage

    AzureMODIS

    Service Web Role Portal

    Total $1420

    bull Clouds are the largest scale computer centers ever constructed and have the

    potential to be important to both large and small scale science problems

    bull Equally import they can increase participation in research providing needed

    resources to userscommunities without ready access

    bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

    support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

    latency applications do not perform optimally on clouds today

    bull Provide valuable fault tolerance and scalability abstractions

    bull Clouds as amplifier for familiar client tools and on premise compute

    bull Clouds services to support research provide considerable leverage for both

    individual researchers and entire communities of researchers

    • Day 2 - Azure as a PaaS
    • Day 2 - Applications

      Windows Azure Overview

      4

      Web Application Model Comparison

      Machines Running IIS ASPNET

      Machines Running Windows Services

      Machines Running SQL Server

      Ad Hoc Application Model

      5

      Web Application Model Comparison

      Machines Running IIS ASPNET

      Machines Running Windows Services

      Machines Running SQL Server

      Ad Hoc Application Model

      Web Role Instances Worker Role

      Instances

      Azure Storage Blob Queue Table

      SQL Azure

      Windows Azure Application Model

      Key Components Fabric Controller

      bull Manages hardware and virtual machines for service

      Compute bull Web Roles

      bull Web application front end

      bull Worker Roles bull Utility compute

      bull VM Roles bull Custom compute role bull You own and customize the VM

      Storage bull Blobs

      bull Binary objects

      bull Tables bull Entity storage

      bull Queues bull Role coordination

      bull SQL Azure bull SQL in the cloud

      Key Components Fabric Controller

      bull Think of it as an automated IT department

      bull ldquoCloud Layerrdquo on top of

      bull Windows Server 2008

      bull A custom version of Hyper-V called the Windows Azure Hypervisor

      bull Allows for automated management of virtual machines

      Key Components Fabric Controller

      bull Think of it as an automated IT department bull ldquoCloud Layerrdquo on top of

      bull Windows Server 2008

      bull A custom version of Hyper-V called the Windows Azure Hypervisor bull Allows for automated management of virtual machines

      bull Itrsquos job is to provision deploy monitor and maintain applications in data centers

      bull Applications have a ldquoshaperdquo and a ldquoconfigurationrdquo bull The configuration definition describes the shape of a service

      bull Role types

      bull Role VM sizes

      bull External and internal endpoints

      bull Local storage

      bull The configuration settings configures a service bull Instance count

      bull Storage keys

      bull Application-specific settings

      Key Components Fabric Controller

      bull Manages ldquonodesrdquo and ldquoedgesrdquo in the ldquofabricrdquo (the hardware) bull Power-on automation devices bull Routers Switches bull Hardware load balancers bull Physical servers bull Virtual servers

      bull State transitions bull Current State bull Goal State bull Does what is needed to reach and maintain the goal state

      bull Itrsquos a perfect IT employee bull Never sleeps bull Doesnrsquot ever ask for raise bull Always does what you tell it to do in configuration definition and settings

      Creating a New Project

      Windows Azure Compute

      Key Components ndash Compute Web Roles

      Web Front End

      bull Cloud web server

      bull Web pages

      bull Web services

      You can create the following types

      bull ASPNET web roles

      bull ASPNET MVC 2 web roles

      bull WCF service web roles

      bull Worker roles

      bull CGI-based web roles

      Key Components ndash Compute Worker Roles

      bull Utility compute

      bull Windows Server 2008

      bull Background processing

      bull Each role can define an amount of local storage

      bull Protected space on the local drive considered volatile storage

      bull May communicate with outside services

      bull Azure Storage

      bull SQL Azure

      bull Other Web services

      bull Can expose external and internal endpoints

      Suggested Application Model Using queues for reliable messaging

      Scalable Fault Tolerant Applications

      Queues are the application glue bull Decouple parts of application easier to scale independently bull Resource allocation different priority queues and backend servers bull Mask faults in worker roles (reliable messaging)

      Key Components ndash Compute VM Roles

      bull Customized Role

      bull You own the box

      bull How it works

      bull Download ldquoGuest OSrdquo to Server 2008 Hyper-V

      bull Customize the OS as you need to

      bull Upload the differences VHD

      bull Azure runs your VM role using

      bull Base OS

      bull Differences VHD

      Application Hosting

      lsquoGrokkingrsquo the service model

      bull Imagine white-boarding out your service architecture with boxes for nodes and arrows describing how they communicate

      bull The service model is the same diagram written down in a declarative format

      bull You give the Fabric the service model and the binaries that go with each of those nodes

      bull The Fabric can provision deploy and manage that diagram for you

      bull Find hardware home

      bull Copy and launch your app binaries

      bull Monitor your app and the hardware

      bull In case of failure take action Perhaps even relocate your app

      bull At all times the lsquodiagramrsquo stays whole

      Automated Service Management Provide code + service model

      bull Platform identifies and allocates resources deploys the service manages service health

      bull Configuration is handled by two files

      ServiceDefinitioncsdef

      ServiceConfigurationcscfg

      Service Definition

      Service Configuration

      GUI

      Double click on Role Name in Azure Project

      Deploying to the cloud

      bull We can deploy from the portal or from script

      bull VS builds two files

      bull Encrypted package of your code

      bull Your config file

      bull You must create an Azure account then a service and then you deploy your code

      bull Can take up to 20 minutes

      bull (which is better than six months)

      Service Management API

      bullREST based API to manage your services

      bullX509-certs for authentication

      bullLets you create delete change upgrade swaphellip

      bullLots of community and MSFT-built tools around the API

      - Easy to roll your own

      The Secret Sauce ndash The Fabric

      The Fabric is the lsquobrainrsquo behind Windows Azure

      1 Process service model

      1 Determine resource requirements

      2 Create role images

      2 Allocate resources

      3 Prepare nodes

      1 Place role images on nodes

      2 Configure settings

      3 Start roles

      4 Configure load balancers

      5 Maintain service health

      1 If role fails restart the role based on policy

      2 If node fails migrate the role based on policy

      Storage

      Durable Storage At Massive Scale

      Blob

      - Massive files eg videos logs

      Drive

      - Use standard file system APIs

      Tables - Non-relational but with few scale limits

      - Use SQL Azure for relational data

      Queues

      - Facilitate loosely-coupled reliable systems

      Blob Features and Functions

      bull Store Large Objects (up to 1TB in size)

      bull Can be served through Windows Azure CDN service

      bull Standard REST Interface

      bull PutBlob

      bull Inserts a new blob overwrites the existing blob

      bull GetBlob

      bull Get whole blob or a specific range

      bull DeleteBlob

      bull CopyBlob

      bull SnapshotBlob

      bull LeaseBlob

      Two Types of Blobs Under the Hood

      bull Block Blob

      bull Targeted at streaming workloads

      bull Each blob consists of a sequence of blocks bull Each block is identified by a Block ID

      bull Size limit 200GB per blob

      bull Page Blob

      bull Targeted at random readwrite workloads

      bull Each blob consists of an array of pages bull Each page is identified by its offset

      from the start of the blob

      bull Size limit 1TB per blob

      Windows Azure Drive

      bull Provides a durable NTFS volume for Windows Azure applications to use

      bull Use existing NTFS APIs to access a durable drive bull Durability and survival of data on application failover

      bull Enables migrating existing NTFS applications to the cloud

      bull A Windows Azure Drive is a Page Blob

      bull Example mount Page Blob as X bull httpltaccountnamegtblobcorewindowsnetltcontainernamegtltblobnamegt

      bull All writes to drive are made durable to the Page Blob bull Drive made durable through standard Page Blob replication

      bull Drive persists even when not mounted as a Page Blob

      Windows Azure Tables

      bull Provides Structured Storage

      bull Massively Scalable Tables bull Billions of entities (rows) and TBs of data

      bull Can use thousands of servers as traffic grows

      bull Highly Available amp Durable bull Data is replicated several times

      bull Familiar and Easy to use API

      bull WCF Data Services and OData bull NET classes and LINQ

      bull REST ndash with any platform or language

      Windows Azure Queues

      bull Queue are performance efficient highly available and provide reliable message delivery

      bull Simple asynchronous work dispatch

      bull Programming semantics ensure that a message can be processed at least once

      bull Access is provided via REST

      Storage Partitioning

      Understanding partitioning is key to understanding performance

      bull Different for each data type (blobs entities queues) Every data object has a partition key

      bull A partition can be served by a single server

      bull System load balances partitions based on traffic pattern

      bull Controls entity locality Partition key is unit of scale

      bull Load balancing can take a few minutes to kick in

      bull Can take a couple of seconds for partition to be available on a different server

      System load balances

      bull Use exponential backoff on ldquoServer Busyrdquo

      bull Our system load balances to meet your traffic needs

      bull Single partition limits have been reached Server Busy

      Partition Keys In Each Abstraction

      bull Entities w same PartitionKey value served from same partition Entities ndash TableName + PartitionKey

      PartitionKey (CustomerId) RowKey (RowKind)

      Name CreditCardNumber OrderTotal

      1 Customer-John Smith John Smith xxxx-xxxx-xxxx-xxxx

      1 Order ndash 1 $3512

      2 Customer-Bill Johnson Bill Johnson xxxx-xxxx-xxxx-xxxx

      2 Order ndash 3 $1000

      bull Every blob and its snapshots are in a single partition Blobs ndash Container name + Blob name

      bullAll messages for a single queue belong to the same partition Messages ndash Queue Name

      Container Name Blob Name

      image annarborbighousejpg

      image foxboroughgillettejpg

      video annarborbighousejpg

      Queue Message

      jobs Message1

      jobs Message2

      workflow Message1

      Scalability Targets

      Storage Account

      bull Capacity ndash Up to 100 TBs

      bull Transactions ndash Up to a few thousand requests per second

      bull Bandwidth ndash Up to a few hundred megabytes per second

      Single QueueTable Partition

      bull Up to 500 transactions per second

      To go above these numbers partition between multiple storage accounts and partitions

      When limit is hit app will see lsquo503 server busyrsquo applications should implement exponential backoff

      Single Blob Partition

      bull Throughput up to 60 MBs

      PartitionKey (Category)

      RowKey (Title)

      Timestamp ReleaseDate

      Action Fast amp Furious hellip 2009

      Action The Bourne Ultimatum hellip 2007

      hellip hellip hellip hellip

      Animation Open Season 2 hellip 2009

      Animation The Ant Bully hellip 2006

      PartitionKey (Category)

      RowKey (Title)

      Timestamp ReleaseDate

      Comedy Office Space hellip 1999

      hellip hellip hellip hellip

      SciFi X-Men Origins Wolverine hellip 2009

      hellip hellip hellip hellip

      War Defiance hellip 2008

      PartitionKey (Category)

      RowKey (Title)

      Timestamp ReleaseDate

      Action Fast amp Furious hellip 2009

      Action The Bourne Ultimatum hellip 2007

      hellip hellip hellip hellip

      Animation Open Season 2 hellip 2009

      Animation The Ant Bully hellip 2006

      hellip hellip hellip hellip

      Comedy Office Space hellip 1999

      hellip hellip hellip hellip

      SciFi X-Men Origins Wolverine hellip 2009

      hellip hellip hellip hellip

      War Defiance hellip 2008

      Partitions and Partition Ranges

      Key Selection Things to Consider

      bullDistribute load as much as possible bullHot partitions can be load balanced bullPartitionKey is critical for scalability

      See httpwwwmicrosoftpdccom2009SVC09 and httpazurescopecloudappnet for more information

      bull Avoid frequent large scans bull Parallelize queries bull Point queries are most efficient

      bullTransactions across a single partition bullTransaction semantics amp Reduce round trips

      Scalability

      Query Efficiency amp Speed

      Entity group transactions

      Expect Continuation Tokens ndash Seriously

      Maximum of 1000 rows in a response

      At the end of partition range boundary

      Maximum of 1000 rows in a response

      At the end of partition range boundary

      Maximum of 5 seconds to execute the query

      Tables Recap bullEfficient for frequently used queries

      bullSupports batch transactions

      bullDistributes load

      Select PartitionKey and RowKey that help scale

      Avoid ldquoAppend onlyrdquo patterns

      Always Handle continuation tokens

      ldquoORrdquo predicates are not optimized

      Implement back-off strategy for retries

      bullDistribute by using a hash etc as prefix

      bullExpect continuation tokens for range queries

      bullExecute the queries that form the ldquoORrdquo predicates as separate queries

      bullServer busy

      bullLoad balance partitions to meet traffic needs

      bullLoad on single partition has exceeded the limits

      WCF Data Services

      bullUse a new context for each logical operation

      bullAddObjectAttachTo can throw exception if entity is already being tracked

      bullPoint query throws an exception if resource does not exist Use IgnoreResourceNotFoundException

      Queues Their Unique Role in Building Reliable Scalable Applications

      bull Want roles that work closely together but are not bound together bull Tight coupling leads to brittleness

      bull This can aid in scaling and performance

      bull A queue can hold an unlimited number of messages bull Messages must be serializable as XML

      bull Limited to 8KB in size

      bull Commonly use the work ticket pattern

      bull Why not simply use a table

      Queue Terminology

      Message Lifecycle

      Queue

      Msg 1

      Msg 2

      Msg 3

      Msg 4

      Worker Role

      Worker Role

      PutMessage

      Web Role

      GetMessage (Timeout) RemoveMessage

      Msg 2 Msg 1

      Worker Role

      Msg 2

      POST httpmyaccountqueuecorewindowsnetmyqueuemessages

      HTTP11 200 OK Transfer-Encoding chunked Content-Type applicationxml Date Tue 09 Dec 2008 210430 GMT Server Nephos Queue Service Version 10 Microsoft-HTTPAPI20

      ltxml version=10 encoding=utf-8gt ltQueueMessagesListgt ltQueueMessagegt ltMessageIdgt5974b586-0df3-4e2d-ad0c-18e3892bfca2ltMessageIdgt ltInsertionTimegtMon 22 Sep 2008 232920 GMTltInsertionTimegt ltExpirationTimegtMon 29 Sep 2008 232920 GMTltExpirationTimegt ltPopReceiptgtYzQ4Yzg1MDIGM0MDFiZDAwYzEwltPopReceiptgt ltTimeNextVisiblegtTue 23 Sep 2008 052920GMTltTimeNextVisiblegt ltMessageTextgtPHRlc3Q+dGdGVzdD4=ltMessageTextgt ltQueueMessagegt ltQueueMessagesListgt

      DELETE httpmyaccountqueuecorewindowsnetmyqueuemessagesmessageidpopreceipt=YzQ4Yzg1MDIGM0MDFiZDAwYzEw

      Truncated Exponential Back Off Polling

      Consider a backoff polling approach Each empty poll

      increases interval by 2x

      A successful sets the interval back to 1

      44

      2 1

      1 1

      C1

      C2

      Removing Poison Messages

      1 1

      2 1

      3 4 0

      Producers Consumers

      P2

      P1

      3 0

      2 GetMessage(Q 30 s) msg 2

      1 GetMessage(Q 30 s) msg 1

      1 1

      2 1

      1 0

      2 0

      45

      C1

      C2

      Removing Poison Messages

      3 4 0

      Producers Consumers

      P2

      P1

      1 1

      2 1

      2 GetMessage(Q 30 s) msg 2 3 C2 consumed msg 2 4 DeleteMessage(Q msg 2) 7 GetMessage(Q 30 s) msg 1

      1 GetMessage(Q 30 s) msg 1 5 C1 crashed

      1 1

      2 1

      6 msg1 visible 30 s after Dequeue 3 0

      1 2

      1 1

      1 2

      46

      C1

      C2

      Removing Poison Messages

      3 4 0

      Producers Consumers

      P2

      P1

      1 2

      2 Dequeue(Q 30 sec) msg 2 3 C2 consumed msg 2 4 Delete(Q msg 2) 7 Dequeue(Q 30 sec) msg 1 8 C2 crashed

      1 Dequeue(Q 30 sec) msg 1 5 C1 crashed 10 C1 restarted 11 Dequeue(Q 30 sec) msg 1 12 DequeueCount gt 2 13 Delete (Q msg1) 1

      2

      6 msg1 visible 30s after Dequeue 9 msg1 visible 30s after Dequeue

      3 0

      1 3

      1 2

      1 3

      Queues Recap

      bullNo need to deal with failures Make message

      processing idempotent

      bullInvisible messages result in out of order Do not rely on order

      bullEnforce threshold on messagersquos dequeue count Use Dequeue count to remove

      poison messages

      bullMessages gt 8KB

      bullBatch messages

      bullGarbage collect orphaned blobs

      bullDynamically increasereduce workers

      Use blob to store message data with

      reference in message

      Use message count to scale

      bullNo need to deal with failures

      bullInvisible messages result in out of order

      bullEnforce threshold on messagersquos dequeue count

      bullDynamically increasereduce workers

      Windows Azure Storage Takeaways

      Blobs

      Drives

      Tables

      Queues

      httpblogsmsdncomwindowsazurestorage

      httpazurescopecloudappnet

      49

      A Quick Exercise

      hellipThen letrsquos look at some code and some tools

      50

      Code ndash AccountInformationcs public class AccountInformation private static string storageKey = ldquotHiSiSnOtMyKeY private static string accountName = jjstore private static StorageCredentialsAccountAndKey credentials internal static StorageCredentialsAccountAndKey Credentials get if (credentials == null) credentials = new StorageCredentialsAccountAndKey(accountName storageKey) return credentials

      51

      Code ndash BlobHelpercs public class BlobHelper private static string defaultContainerName = school private CloudBlobClient client = null private CloudBlobContainer container = null private void InitContainer() if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudBlobClient() container = clientGetContainerReference(defaultContainerName) containerCreateIfNotExist() BlobContainerPermissions permissions = containerGetPermissions() permissionsPublicAccess = BlobContainerPublicAccessTypeContainer containerSetPermissions(permissions)

      52

      Code ndash BlobHelpercs

      public void WriteFileToBlob(string filePath) if (client == null || container == null) InitContainer() FileInfo file = new FileInfo(filePath) CloudBlob blob = containerGetBlobReference(fileName) blobPropertiesContentType = GetContentType(fileExtension) blobUploadFile(fileFullName) Or if you want to write a string replace the last line with blobUploadText(someString) And make sure you set the content type to the appropriate MIME type (eg ldquotextplainrdquo)

      53

      Code ndash BlobHelpercs

      public string GetBlobText(string blobName) if (client == null || container == null) InitContainer() CloudBlob blob = containerGetBlobReference(blobName) try return blobDownloadText() catch (Exception) The blob probably does not exist or there is no connection available return null

      54

      Application Code - Blobs private void SaveToCloudButton_Click(object sender RoutedEventArgs e) StringBuilder buff = new StringBuilder() buffAppendLine(LastNameFirstNameEmailBirthdayNativeLanguageFavoriteIceCreamYearsInPhDGraduated) foreach (AttendeeEntity attendee in attendees) buffAppendLine(attendeeToCsvString()) blobHelperWriteStringToBlob(SummerSchoolAttendeestxt buffToString())

      The blob is now available at httpltAccountNamegtblobcorewindowsnetltContainerNamegtltBlobNamegt Or in this case httpjjstoreblobcorewindowsnetschoolSummerSchoolAttendeestxt

      55

      Code - TableEntities using MicrosoftWindowsAzureStorageClient public class AttendeeEntity TableServiceEntity public string FirstName get set public string LastName get set public string Email get set public DateTime Birthday get set public string FavoriteIceCream get set public int YearsInPhD get set public bool Graduated get set hellip

      56

      Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

      57

      Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

      58

      Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

      59

      Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

      60

      Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

      Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

      61

      Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

      62

      Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

      Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

      63

      Tools ndash Fiddler2

      Best Practices

      Picking the Right VM Size

      bull Having the correct VM size can make a big difference in costs

      bull Fundamental choice ndash larger fewer VMs vs many smaller instances

      bull If you scale better than linear across cores larger VMs could save you money

      bull Pretty rare to see linear scaling across 8 cores

      bull More instances may provide better uptime and reliability (more failures needed to take your service down)

      bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

      Using Your VM to the Maximum

      Remember bull 1 role instance == 1 VM running Windows

      bull 1 role instance = one specific task for your code

      bull Yoursquore paying for the entire VM so why not use it

      bull Common mistake ndash split up code into multiple roles each not using up CPU

      bull Balance between using up CPU vs having free capacity in times of need

      bull Multiple ways to use your CPU to the fullest

      Exploiting Concurrency

      bull Spin up additional processes each with a specific task or as a unit of concurrency

      bull May not be ideal if number of active processes exceeds number of cores

      bull Use multithreading aggressively

      bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

      bull In NET 4 use the Task Parallel Library

      bull Data parallelism

      bull Task parallelism

      Finding Good Code Neighbors

      bull Typically code falls into one or more of these categories

      bull Find code that is intensive with different resources to live together

      bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

      Memory Intensive

      CPU Intensive

      Network IO Intensive

      Storage IO Intensive

      Scaling Appropriately

      bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

      bull Spinning VMs up and down automatically is good at large scale

      bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

      bull Being too aggressive in spinning down VMs can result in poor user experience

      bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

      Performance Cost

      Storage Costs

      bull Understand an applicationrsquos storage profile and how storage billing works

      bull Make service choices based on your app profile

      bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

      bull Service choice can make a big cost difference based on your app profile

      bull Caching and compressing They help a lot with storage costs

      Saving Bandwidth Costs

      Bandwidth costs are a huge part of any popular web apprsquos billing profile

      Sending fewer things over the wire often means getting fewer things from storage

      Saving bandwidth costs often lead to savings in other places

      Sending fewer things means your VM has time to do other tasks

      All of these tips have the side benefit of improving your web apprsquos performance and user experience

      Compressing Content

      1 Gzip all output content

      bull All modern browsers can decompress on the fly

      bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

      2Tradeoff compute costs for storage size

      3Minimize image sizes

      bull Use Portable Network Graphics (PNGs)

      bull Crush your PNGs

      bull Strip needless metadata

      bull Make all PNGs palette PNGs

      Uncompressed

      Content

      Compressed

      Content

      Gzip

      Minify JavaScript

      Minify CCS

      Minify Images

      Best Practices Summary

      Doing lsquolessrsquo is the key to saving costs

      Measure everything

      Know your application profile in and out

      Research Examples in the Cloud

      hellipon another set of slides

      Map Reduce on Azure

      bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

      bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

      76

      Project Daytona - Map Reduce on Azure

      httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

      77

      Questions and Discussionhellip

      Thank you for hosting me at the Summer School

      BLAST (Basic Local Alignment Search Tool)

      bull The most important software in bioinformatics

      bull Identify similarity between bio-sequences

      Computationally intensive

      bull Large number of pairwise alignment operations

      bull A BLAST running can take 700 ~ 1000 CPU hours

      bull Sequence databases growing exponentially

      bull GenBank doubled in size in about 15 months

      It is easy to parallelize BLAST

      bull Segment the input

      bull Segment processing (querying) is pleasingly parallel

      bull Segment the database (eg mpiBLAST)

      bull Needs special result reduction processing

      Large volume data

      bull A normal Blast database can be as large as 10GB

      bull 100 nodes means the peak storage bandwidth could reach to 1TB

      bull The output of BLAST is usually 10-100x larger than the input

      bull Parallel BLAST engine on Azure

      bull Query-segmentation data-parallel pattern bull split the input sequences

      bull query partitions in parallel

      bull merge results together when done

      bull Follows the general suggested application model bull Web Role + Queue + Worker

      bull With three special considerations bull Batch job management

      bull Task parallelism on an elastic Cloud

      Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

      on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

      A simple SplitJoin pattern

      Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

      bull 1248 for small middle large and extra large instance size

      Task granularity bull Large partition load imbalance

      bull Small partition unnecessary overheads bull NCBI-BLAST overhead

      bull Data transferring overhead

      Best Practice test runs to profiling and set size to mitigate the overhead

      Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

      bull too small repeated computation

      bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

      bull Estimate the value based on the number of pair-bases in the partition and test-runs

      bull Watch out for the 2-hour maximum limitation

      BLAST task

      Splitting task

      BLAST task

      BLAST task

      BLAST task

      hellip

      Merging Task

      Task size vs Performance

      bull Benefit of the warm cache effect

      bull 100 sequences per partition is the best choice

      Instance size vs Performance

      bull Super-linear speedup with larger size worker instances

      bull Primarily due to the memory capability

      Task SizeInstance Size vs Cost

      bull Extra-large instance generated the best and the most economical throughput

      bull Fully utilize the resource

      Web

      Portal

      Web

      Service

      Job registration

      Job Scheduler

      Worker

      Worker

      Worker

      Global

      dispatch

      queue

      Web Role

      Azure Table

      Job Management Role

      Azure Blob

      Database

      updating Role

      hellip

      Scaling Engine

      Blast databases

      temporary data etc)

      Job Registry NCBI databases

      BLAST task

      Splitting task

      BLAST task

      BLAST task

      BLAST task

      hellip

      Merging Task

      ASPNET program hosted by a web role instance bull Submit jobs

      bull Track jobrsquos status and logs

      AuthenticationAuthorization based on Live ID

      The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

      Web Portal

      Web Service

      Job registration

      Job Scheduler

      Job Portal

      Scaling Engine

      Job Registry

      R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

      Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

      bull Against ~5000 proteins from another strain completed in less than 30 sec

      AzureBLAST significantly saved computing timehellip

      Discovering Homologs bull Discover the interrelationships of known protein sequences

      ldquoAll against Allrdquo query bull The database is also the input query

      bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

      bull Theoretically 100 billion sequence comparisons

      Performance estimation bull Based on the sampling-running on one extra-large Azure instance

      bull Would require 3216731 minutes (61 years) on one desktop

      One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

      bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

      bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

      bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

      bull Each segment consists of smaller partitions

      bull When load imbalances redistribute the load manually

      5

      0

      62 6

      2 6

      2 6

      2 6

      2 5

      0 62

      bull Total size of the output result is ~230GB

      bull The number of total hits is 1764579487

      bull Started at March 25th the last task completed on April 8th (10 days compute)

      bull But based our estimates real working instance time should be 6~8 day

      bull Look into log data to analyze what took placehellip

      5

      0

      62 6

      2 6

      2 6

      2 6

      2 5

      0 62

      A normal log record should be

      Otherwise something is wrong (eg task failed to complete)

      3312010 614 RD00155D3611B0 Executing the task 251523

      3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

      3312010 625 RD00155D3611B0 Executing the task 251553

      3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

      3312010 644 RD00155D3611B0 Executing the task 251600

      3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

      3312010 822 RD00155D3611B0 Executing the task 251774

      3312010 950 RD00155D3611B0 Executing the task 251895

      3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

      North Europe Data Center totally 34256 tasks processed

      All 62 compute nodes lost tasks

      and then came back in a group

      This is an Update domain

      ~30 mins

      ~ 6 nodes in one group

      35 Nodes experience blob

      writing failure at same

      time

      West Europe Datacenter 30976 tasks are completed and job was killed

      A reasonable guess the

      Fault Domain is working

      MODISAzure Computing Evapotranspiration (ET) in The Cloud

      You never miss the water till the well has run dry Irish Proverb

      ET = Water volume evapotranspired (m3 s-1 m-2)

      Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

      λv = Latent heat of vaporization (Jg)

      Rn = Net radiation (W m-2)

      cp = Specific heat capacity of air (J kg-1 K-1)

      ρa = dry air density (kg m-3)

      δq = vapor pressure deficit (Pa)

      ga = Conductivity of air (inverse of ra) (m s-1)

      gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

      γ = Psychrometric constant (γ asymp 66 Pa K-1)

      Estimating resistanceconductivity across a

      catchment can be tricky

      bull Lots of inputs big data reduction

      bull Some of the inputs are not so simple

      119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

      (∆ + 120574 1 + 119892119886 119892119904 )120582120592

      Penman-Monteith (1964)

      Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

      NASA MODIS

      imagery source

      archives

      5 TB (600K files)

      FLUXNET curated

      sensor dataset

      (30GB 960 files)

      FLUXNET curated

      field dataset

      2 KB (1 file)

      NCEPNCAR

      ~100MB

      (4K files)

      Vegetative

      clumping

      ~5MB (1file)

      Climate

      classification

      ~1MB (1file)

      20 US year = 1 global year

      Data collection (map) stage

      bull Downloads requested input tiles

      from NASA ftp sites

      bull Includes geospatial lookup for

      non-sinusoidal tiles that will

      contribute to a reprojected

      sinusoidal tile

      Reprojection (map) stage

      bull Converts source tile(s) to

      intermediate result sinusoidal tiles

      bull Simple nearest neighbor or spline

      algorithms

      Derivation reduction stage

      bull First stage visible to scientist

      bull Computes ET in our initial use

      Analysis reduction stage

      bull Optional second stage visible to

      scientist

      bull Enables production of science

      analysis artifacts such as maps

      tables virtual sensors

      Reduction 1

      Queue

      Source

      Metadata

      AzureMODIS

      Service Web Role Portal

      Request

      Queue

      Scientific

      Results

      Download

      Data Collection Stage

      Source Imagery Download Sites

      Reprojection

      Queue

      Reduction 2

      Queue

      Download

      Queue

      Scientists

      Science results

      Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

      httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

      bull ModisAzure Service is the Web Role front door bull Receives all user requests

      bull Queues request to appropriate Download Reprojection or Reduction Job Queue

      bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

      recoverable units of work

      bull Execution status of all jobs and tasks persisted in Tables

      ltPipelineStagegt

      Request

      hellip ltPipelineStagegtJobStatus

      Persist ltPipelineStagegtJob Queue

      MODISAzure Service

      (Web Role)

      Service Monitor

      (Worker Role)

      Parse amp Persist ltPipelineStagegtTaskStatus

      hellip

      Dispatch

      ltPipelineStagegtTask Queue

      All work actually done by a Worker Role

      bull Sandboxes science or other executable

      bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

      Service Monitor

      (Worker Role)

      Parse amp Persist ltPipelineStagegtTaskStatus

      GenericWorker

      (Worker Role)

      hellip

      hellip

      Dispatch

      ltPipelineStagegtTask Queue

      hellip

      ltInputgtData Storage

      bull Dequeues tasks created by the Service Monitor

      bull Retries failed tasks 3 times

      bull Maintains all task status

      Reprojection Request

      hellip

      Service Monitor

      (Worker Role)

      ReprojectionJobStatus Persist

      Parse amp Persist ReprojectionTaskStatus

      GenericWorker

      (Worker Role)

      hellip

      Job Queue

      hellip

      Dispatch

      Task Queue

      Points to

      hellip

      ScanTimeList

      SwathGranuleMeta

      Reprojection Data

      Storage

      Each entity specifies a

      single reprojection job

      request

      Each entity specifies a

      single reprojection task (ie

      a single tile)

      Query this table to get

      geo-metadata (eg

      boundaries) for each swath

      tile

      Query this table to get the

      list of satellite scan times

      that cover a target tile

      Swath Source

      Data Storage

      bull Computational costs driven by data scale and need to run reduction multiple times

      bull Storage costs driven by data scale and 6 month project duration

      bull Small with respect to the people costs even at graduate student rates

      Reduction 1 Queue

      Source Metadata

      Request Queue

      Scientific Results Download

      Data Collection Stage

      Source Imagery Download Sites

      Reprojection Queue

      Reduction 2 Queue

      Download

      Queue

      Scientists

      Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

      400-500 GB

      60K files

      10 MBsec

      11 hours

      lt10 workers

      $50 upload

      $450 storage

      400 GB

      45K files

      3500 hours

      20-100

      workers

      5-7 GB

      55K files

      1800 hours

      20-100

      workers

      lt10 GB

      ~1K files

      1800 hours

      20-100

      workers

      $420 cpu

      $60 download

      $216 cpu

      $1 download

      $6 storage

      $216 cpu

      $2 download

      $9 storage

      AzureMODIS

      Service Web Role Portal

      Total $1420

      bull Clouds are the largest scale computer centers ever constructed and have the

      potential to be important to both large and small scale science problems

      bull Equally import they can increase participation in research providing needed

      resources to userscommunities without ready access

      bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

      support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

      latency applications do not perform optimally on clouds today

      bull Provide valuable fault tolerance and scalability abstractions

      bull Clouds as amplifier for familiar client tools and on premise compute

      bull Clouds services to support research provide considerable leverage for both

      individual researchers and entire communities of researchers

      • Day 2 - Azure as a PaaS
      • Day 2 - Applications

        4

        Web Application Model Comparison

        Machines Running IIS ASPNET

        Machines Running Windows Services

        Machines Running SQL Server

        Ad Hoc Application Model

        5

        Web Application Model Comparison

        Machines Running IIS ASPNET

        Machines Running Windows Services

        Machines Running SQL Server

        Ad Hoc Application Model

        Web Role Instances Worker Role

        Instances

        Azure Storage Blob Queue Table

        SQL Azure

        Windows Azure Application Model

        Key Components Fabric Controller

        bull Manages hardware and virtual machines for service

        Compute bull Web Roles

        bull Web application front end

        bull Worker Roles bull Utility compute

        bull VM Roles bull Custom compute role bull You own and customize the VM

        Storage bull Blobs

        bull Binary objects

        bull Tables bull Entity storage

        bull Queues bull Role coordination

        bull SQL Azure bull SQL in the cloud

        Key Components Fabric Controller

        bull Think of it as an automated IT department

        bull ldquoCloud Layerrdquo on top of

        bull Windows Server 2008

        bull A custom version of Hyper-V called the Windows Azure Hypervisor

        bull Allows for automated management of virtual machines

        Key Components Fabric Controller

        bull Think of it as an automated IT department bull ldquoCloud Layerrdquo on top of

        bull Windows Server 2008

        bull A custom version of Hyper-V called the Windows Azure Hypervisor bull Allows for automated management of virtual machines

        bull Itrsquos job is to provision deploy monitor and maintain applications in data centers

        bull Applications have a ldquoshaperdquo and a ldquoconfigurationrdquo bull The configuration definition describes the shape of a service

        bull Role types

        bull Role VM sizes

        bull External and internal endpoints

        bull Local storage

        bull The configuration settings configures a service bull Instance count

        bull Storage keys

        bull Application-specific settings

        Key Components Fabric Controller

        bull Manages ldquonodesrdquo and ldquoedgesrdquo in the ldquofabricrdquo (the hardware) bull Power-on automation devices bull Routers Switches bull Hardware load balancers bull Physical servers bull Virtual servers

        bull State transitions bull Current State bull Goal State bull Does what is needed to reach and maintain the goal state

        bull Itrsquos a perfect IT employee bull Never sleeps bull Doesnrsquot ever ask for raise bull Always does what you tell it to do in configuration definition and settings

        Creating a New Project

        Windows Azure Compute

        Key Components ndash Compute Web Roles

        Web Front End

        bull Cloud web server

        bull Web pages

        bull Web services

        You can create the following types

        bull ASPNET web roles

        bull ASPNET MVC 2 web roles

        bull WCF service web roles

        bull Worker roles

        bull CGI-based web roles

        Key Components ndash Compute Worker Roles

        bull Utility compute

        bull Windows Server 2008

        bull Background processing

        bull Each role can define an amount of local storage

        bull Protected space on the local drive considered volatile storage

        bull May communicate with outside services

        bull Azure Storage

        bull SQL Azure

        bull Other Web services

        bull Can expose external and internal endpoints

        Suggested Application Model Using queues for reliable messaging

        Scalable Fault Tolerant Applications

        Queues are the application glue bull Decouple parts of application easier to scale independently bull Resource allocation different priority queues and backend servers bull Mask faults in worker roles (reliable messaging)

        Key Components ndash Compute VM Roles

        bull Customized Role

        bull You own the box

        bull How it works

        bull Download ldquoGuest OSrdquo to Server 2008 Hyper-V

        bull Customize the OS as you need to

        bull Upload the differences VHD

        bull Azure runs your VM role using

        bull Base OS

        bull Differences VHD

        Application Hosting

        lsquoGrokkingrsquo the service model

        bull Imagine white-boarding out your service architecture with boxes for nodes and arrows describing how they communicate

        bull The service model is the same diagram written down in a declarative format

        bull You give the Fabric the service model and the binaries that go with each of those nodes

        bull The Fabric can provision deploy and manage that diagram for you

        bull Find hardware home

        bull Copy and launch your app binaries

        bull Monitor your app and the hardware

        bull In case of failure take action Perhaps even relocate your app

        bull At all times the lsquodiagramrsquo stays whole

        Automated Service Management Provide code + service model

        bull Platform identifies and allocates resources deploys the service manages service health

        bull Configuration is handled by two files

        ServiceDefinitioncsdef

        ServiceConfigurationcscfg

        Service Definition

        Service Configuration

        GUI

        Double click on Role Name in Azure Project

        Deploying to the cloud

        bull We can deploy from the portal or from script

        bull VS builds two files

        bull Encrypted package of your code

        bull Your config file

        bull You must create an Azure account then a service and then you deploy your code

        bull Can take up to 20 minutes

        bull (which is better than six months)

        Service Management API

        bullREST based API to manage your services

        bullX509-certs for authentication

        bullLets you create delete change upgrade swaphellip

        bullLots of community and MSFT-built tools around the API

        - Easy to roll your own

        The Secret Sauce ndash The Fabric

        The Fabric is the lsquobrainrsquo behind Windows Azure

        1 Process service model

        1 Determine resource requirements

        2 Create role images

        2 Allocate resources

        3 Prepare nodes

        1 Place role images on nodes

        2 Configure settings

        3 Start roles

        4 Configure load balancers

        5 Maintain service health

        1 If role fails restart the role based on policy

        2 If node fails migrate the role based on policy

        Storage

        Durable Storage At Massive Scale

        Blob

        - Massive files eg videos logs

        Drive

        - Use standard file system APIs

        Tables - Non-relational but with few scale limits

        - Use SQL Azure for relational data

        Queues

        - Facilitate loosely-coupled reliable systems

        Blob Features and Functions

        bull Store Large Objects (up to 1TB in size)

        bull Can be served through Windows Azure CDN service

        bull Standard REST Interface

        bull PutBlob

        bull Inserts a new blob overwrites the existing blob

        bull GetBlob

        bull Get whole blob or a specific range

        bull DeleteBlob

        bull CopyBlob

        bull SnapshotBlob

        bull LeaseBlob

        Two Types of Blobs Under the Hood

        bull Block Blob

        bull Targeted at streaming workloads

        bull Each blob consists of a sequence of blocks bull Each block is identified by a Block ID

        bull Size limit 200GB per blob

        bull Page Blob

        bull Targeted at random readwrite workloads

        bull Each blob consists of an array of pages bull Each page is identified by its offset

        from the start of the blob

        bull Size limit 1TB per blob

        Windows Azure Drive

        bull Provides a durable NTFS volume for Windows Azure applications to use

        bull Use existing NTFS APIs to access a durable drive bull Durability and survival of data on application failover

        bull Enables migrating existing NTFS applications to the cloud

        bull A Windows Azure Drive is a Page Blob

        bull Example mount Page Blob as X bull httpltaccountnamegtblobcorewindowsnetltcontainernamegtltblobnamegt

        bull All writes to drive are made durable to the Page Blob bull Drive made durable through standard Page Blob replication

        bull Drive persists even when not mounted as a Page Blob

        Windows Azure Tables

        bull Provides Structured Storage

        bull Massively Scalable Tables bull Billions of entities (rows) and TBs of data

        bull Can use thousands of servers as traffic grows

        bull Highly Available amp Durable bull Data is replicated several times

        bull Familiar and Easy to use API

        bull WCF Data Services and OData bull NET classes and LINQ

        bull REST ndash with any platform or language

        Windows Azure Queues

        bull Queue are performance efficient highly available and provide reliable message delivery

        bull Simple asynchronous work dispatch

        bull Programming semantics ensure that a message can be processed at least once

        bull Access is provided via REST

        Storage Partitioning

        Understanding partitioning is key to understanding performance

        bull Different for each data type (blobs entities queues) Every data object has a partition key

        bull A partition can be served by a single server

        bull System load balances partitions based on traffic pattern

        bull Controls entity locality Partition key is unit of scale

        bull Load balancing can take a few minutes to kick in

        bull Can take a couple of seconds for partition to be available on a different server

        System load balances

        bull Use exponential backoff on ldquoServer Busyrdquo

        bull Our system load balances to meet your traffic needs

        bull Single partition limits have been reached Server Busy

        Partition Keys In Each Abstraction

        bull Entities w same PartitionKey value served from same partition Entities ndash TableName + PartitionKey

        PartitionKey (CustomerId) RowKey (RowKind)

        Name CreditCardNumber OrderTotal

        1 Customer-John Smith John Smith xxxx-xxxx-xxxx-xxxx

        1 Order ndash 1 $3512

        2 Customer-Bill Johnson Bill Johnson xxxx-xxxx-xxxx-xxxx

        2 Order ndash 3 $1000

        bull Every blob and its snapshots are in a single partition Blobs ndash Container name + Blob name

        bullAll messages for a single queue belong to the same partition Messages ndash Queue Name

        Container Name Blob Name

        image annarborbighousejpg

        image foxboroughgillettejpg

        video annarborbighousejpg

        Queue Message

        jobs Message1

        jobs Message2

        workflow Message1

        Scalability Targets

        Storage Account

        bull Capacity ndash Up to 100 TBs

        bull Transactions ndash Up to a few thousand requests per second

        bull Bandwidth ndash Up to a few hundred megabytes per second

        Single QueueTable Partition

        bull Up to 500 transactions per second

        To go above these numbers partition between multiple storage accounts and partitions

        When limit is hit app will see lsquo503 server busyrsquo applications should implement exponential backoff

        Single Blob Partition

        bull Throughput up to 60 MBs

        PartitionKey (Category)

        RowKey (Title)

        Timestamp ReleaseDate

        Action Fast amp Furious hellip 2009

        Action The Bourne Ultimatum hellip 2007

        hellip hellip hellip hellip

        Animation Open Season 2 hellip 2009

        Animation The Ant Bully hellip 2006

        PartitionKey (Category)

        RowKey (Title)

        Timestamp ReleaseDate

        Comedy Office Space hellip 1999

        hellip hellip hellip hellip

        SciFi X-Men Origins Wolverine hellip 2009

        hellip hellip hellip hellip

        War Defiance hellip 2008

        PartitionKey (Category)

        RowKey (Title)

        Timestamp ReleaseDate

        Action Fast amp Furious hellip 2009

        Action The Bourne Ultimatum hellip 2007

        hellip hellip hellip hellip

        Animation Open Season 2 hellip 2009

        Animation The Ant Bully hellip 2006

        hellip hellip hellip hellip

        Comedy Office Space hellip 1999

        hellip hellip hellip hellip

        SciFi X-Men Origins Wolverine hellip 2009

        hellip hellip hellip hellip

        War Defiance hellip 2008

        Partitions and Partition Ranges

        Key Selection Things to Consider

        bullDistribute load as much as possible bullHot partitions can be load balanced bullPartitionKey is critical for scalability

        See httpwwwmicrosoftpdccom2009SVC09 and httpazurescopecloudappnet for more information

        bull Avoid frequent large scans bull Parallelize queries bull Point queries are most efficient

        bullTransactions across a single partition bullTransaction semantics amp Reduce round trips

        Scalability

        Query Efficiency amp Speed

        Entity group transactions

        Expect Continuation Tokens ndash Seriously

        Maximum of 1000 rows in a response

        At the end of partition range boundary

        Maximum of 1000 rows in a response

        At the end of partition range boundary

        Maximum of 5 seconds to execute the query

        Tables Recap bullEfficient for frequently used queries

        bullSupports batch transactions

        bullDistributes load

        Select PartitionKey and RowKey that help scale

        Avoid ldquoAppend onlyrdquo patterns

        Always Handle continuation tokens

        ldquoORrdquo predicates are not optimized

        Implement back-off strategy for retries

        bullDistribute by using a hash etc as prefix

        bullExpect continuation tokens for range queries

        bullExecute the queries that form the ldquoORrdquo predicates as separate queries

        bullServer busy

        bullLoad balance partitions to meet traffic needs

        bullLoad on single partition has exceeded the limits

        WCF Data Services

        bullUse a new context for each logical operation

        bullAddObjectAttachTo can throw exception if entity is already being tracked

        bullPoint query throws an exception if resource does not exist Use IgnoreResourceNotFoundException

        Queues Their Unique Role in Building Reliable Scalable Applications

        bull Want roles that work closely together but are not bound together bull Tight coupling leads to brittleness

        bull This can aid in scaling and performance

        bull A queue can hold an unlimited number of messages bull Messages must be serializable as XML

        bull Limited to 8KB in size

        bull Commonly use the work ticket pattern

        bull Why not simply use a table

        Queue Terminology

        Message Lifecycle

        Queue

        Msg 1

        Msg 2

        Msg 3

        Msg 4

        Worker Role

        Worker Role

        PutMessage

        Web Role

        GetMessage (Timeout) RemoveMessage

        Msg 2 Msg 1

        Worker Role

        Msg 2

        POST httpmyaccountqueuecorewindowsnetmyqueuemessages

        HTTP11 200 OK Transfer-Encoding chunked Content-Type applicationxml Date Tue 09 Dec 2008 210430 GMT Server Nephos Queue Service Version 10 Microsoft-HTTPAPI20

        ltxml version=10 encoding=utf-8gt ltQueueMessagesListgt ltQueueMessagegt ltMessageIdgt5974b586-0df3-4e2d-ad0c-18e3892bfca2ltMessageIdgt ltInsertionTimegtMon 22 Sep 2008 232920 GMTltInsertionTimegt ltExpirationTimegtMon 29 Sep 2008 232920 GMTltExpirationTimegt ltPopReceiptgtYzQ4Yzg1MDIGM0MDFiZDAwYzEwltPopReceiptgt ltTimeNextVisiblegtTue 23 Sep 2008 052920GMTltTimeNextVisiblegt ltMessageTextgtPHRlc3Q+dGdGVzdD4=ltMessageTextgt ltQueueMessagegt ltQueueMessagesListgt

        DELETE httpmyaccountqueuecorewindowsnetmyqueuemessagesmessageidpopreceipt=YzQ4Yzg1MDIGM0MDFiZDAwYzEw

        Truncated Exponential Back Off Polling

        Consider a backoff polling approach Each empty poll

        increases interval by 2x

        A successful sets the interval back to 1

        44

        2 1

        1 1

        C1

        C2

        Removing Poison Messages

        1 1

        2 1

        3 4 0

        Producers Consumers

        P2

        P1

        3 0

        2 GetMessage(Q 30 s) msg 2

        1 GetMessage(Q 30 s) msg 1

        1 1

        2 1

        1 0

        2 0

        45

        C1

        C2

        Removing Poison Messages

        3 4 0

        Producers Consumers

        P2

        P1

        1 1

        2 1

        2 GetMessage(Q 30 s) msg 2 3 C2 consumed msg 2 4 DeleteMessage(Q msg 2) 7 GetMessage(Q 30 s) msg 1

        1 GetMessage(Q 30 s) msg 1 5 C1 crashed

        1 1

        2 1

        6 msg1 visible 30 s after Dequeue 3 0

        1 2

        1 1

        1 2

        46

        C1

        C2

        Removing Poison Messages

        3 4 0

        Producers Consumers

        P2

        P1

        1 2

        2 Dequeue(Q 30 sec) msg 2 3 C2 consumed msg 2 4 Delete(Q msg 2) 7 Dequeue(Q 30 sec) msg 1 8 C2 crashed

        1 Dequeue(Q 30 sec) msg 1 5 C1 crashed 10 C1 restarted 11 Dequeue(Q 30 sec) msg 1 12 DequeueCount gt 2 13 Delete (Q msg1) 1

        2

        6 msg1 visible 30s after Dequeue 9 msg1 visible 30s after Dequeue

        3 0

        1 3

        1 2

        1 3

        Queues Recap

        bullNo need to deal with failures Make message

        processing idempotent

        bullInvisible messages result in out of order Do not rely on order

        bullEnforce threshold on messagersquos dequeue count Use Dequeue count to remove

        poison messages

        bullMessages gt 8KB

        bullBatch messages

        bullGarbage collect orphaned blobs

        bullDynamically increasereduce workers

        Use blob to store message data with

        reference in message

        Use message count to scale

        bullNo need to deal with failures

        bullInvisible messages result in out of order

        bullEnforce threshold on messagersquos dequeue count

        bullDynamically increasereduce workers

        Windows Azure Storage Takeaways

        Blobs

        Drives

        Tables

        Queues

        httpblogsmsdncomwindowsazurestorage

        httpazurescopecloudappnet

        49

        A Quick Exercise

        hellipThen letrsquos look at some code and some tools

        50

        Code ndash AccountInformationcs public class AccountInformation private static string storageKey = ldquotHiSiSnOtMyKeY private static string accountName = jjstore private static StorageCredentialsAccountAndKey credentials internal static StorageCredentialsAccountAndKey Credentials get if (credentials == null) credentials = new StorageCredentialsAccountAndKey(accountName storageKey) return credentials

        51

        Code ndash BlobHelpercs public class BlobHelper private static string defaultContainerName = school private CloudBlobClient client = null private CloudBlobContainer container = null private void InitContainer() if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudBlobClient() container = clientGetContainerReference(defaultContainerName) containerCreateIfNotExist() BlobContainerPermissions permissions = containerGetPermissions() permissionsPublicAccess = BlobContainerPublicAccessTypeContainer containerSetPermissions(permissions)

        52

        Code ndash BlobHelpercs

        public void WriteFileToBlob(string filePath) if (client == null || container == null) InitContainer() FileInfo file = new FileInfo(filePath) CloudBlob blob = containerGetBlobReference(fileName) blobPropertiesContentType = GetContentType(fileExtension) blobUploadFile(fileFullName) Or if you want to write a string replace the last line with blobUploadText(someString) And make sure you set the content type to the appropriate MIME type (eg ldquotextplainrdquo)

        53

        Code ndash BlobHelpercs

        public string GetBlobText(string blobName) if (client == null || container == null) InitContainer() CloudBlob blob = containerGetBlobReference(blobName) try return blobDownloadText() catch (Exception) The blob probably does not exist or there is no connection available return null

        54

        Application Code - Blobs private void SaveToCloudButton_Click(object sender RoutedEventArgs e) StringBuilder buff = new StringBuilder() buffAppendLine(LastNameFirstNameEmailBirthdayNativeLanguageFavoriteIceCreamYearsInPhDGraduated) foreach (AttendeeEntity attendee in attendees) buffAppendLine(attendeeToCsvString()) blobHelperWriteStringToBlob(SummerSchoolAttendeestxt buffToString())

        The blob is now available at httpltAccountNamegtblobcorewindowsnetltContainerNamegtltBlobNamegt Or in this case httpjjstoreblobcorewindowsnetschoolSummerSchoolAttendeestxt

        55

        Code - TableEntities using MicrosoftWindowsAzureStorageClient public class AttendeeEntity TableServiceEntity public string FirstName get set public string LastName get set public string Email get set public DateTime Birthday get set public string FavoriteIceCream get set public int YearsInPhD get set public bool Graduated get set hellip

        56

        Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

        57

        Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

        58

        Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

        59

        Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

        60

        Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

        Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

        61

        Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

        62

        Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

        Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

        63

        Tools ndash Fiddler2

        Best Practices

        Picking the Right VM Size

        bull Having the correct VM size can make a big difference in costs

        bull Fundamental choice ndash larger fewer VMs vs many smaller instances

        bull If you scale better than linear across cores larger VMs could save you money

        bull Pretty rare to see linear scaling across 8 cores

        bull More instances may provide better uptime and reliability (more failures needed to take your service down)

        bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

        Using Your VM to the Maximum

        Remember bull 1 role instance == 1 VM running Windows

        bull 1 role instance = one specific task for your code

        bull Yoursquore paying for the entire VM so why not use it

        bull Common mistake ndash split up code into multiple roles each not using up CPU

        bull Balance between using up CPU vs having free capacity in times of need

        bull Multiple ways to use your CPU to the fullest

        Exploiting Concurrency

        bull Spin up additional processes each with a specific task or as a unit of concurrency

        bull May not be ideal if number of active processes exceeds number of cores

        bull Use multithreading aggressively

        bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

        bull In NET 4 use the Task Parallel Library

        bull Data parallelism

        bull Task parallelism

        Finding Good Code Neighbors

        bull Typically code falls into one or more of these categories

        bull Find code that is intensive with different resources to live together

        bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

        Memory Intensive

        CPU Intensive

        Network IO Intensive

        Storage IO Intensive

        Scaling Appropriately

        bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

        bull Spinning VMs up and down automatically is good at large scale

        bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

        bull Being too aggressive in spinning down VMs can result in poor user experience

        bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

        Performance Cost

        Storage Costs

        bull Understand an applicationrsquos storage profile and how storage billing works

        bull Make service choices based on your app profile

        bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

        bull Service choice can make a big cost difference based on your app profile

        bull Caching and compressing They help a lot with storage costs

        Saving Bandwidth Costs

        Bandwidth costs are a huge part of any popular web apprsquos billing profile

        Sending fewer things over the wire often means getting fewer things from storage

        Saving bandwidth costs often lead to savings in other places

        Sending fewer things means your VM has time to do other tasks

        All of these tips have the side benefit of improving your web apprsquos performance and user experience

        Compressing Content

        1 Gzip all output content

        bull All modern browsers can decompress on the fly

        bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

        2Tradeoff compute costs for storage size

        3Minimize image sizes

        bull Use Portable Network Graphics (PNGs)

        bull Crush your PNGs

        bull Strip needless metadata

        bull Make all PNGs palette PNGs

        Uncompressed

        Content

        Compressed

        Content

        Gzip

        Minify JavaScript

        Minify CCS

        Minify Images

        Best Practices Summary

        Doing lsquolessrsquo is the key to saving costs

        Measure everything

        Know your application profile in and out

        Research Examples in the Cloud

        hellipon another set of slides

        Map Reduce on Azure

        bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

        bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

        76

        Project Daytona - Map Reduce on Azure

        httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

        77

        Questions and Discussionhellip

        Thank you for hosting me at the Summer School

        BLAST (Basic Local Alignment Search Tool)

        bull The most important software in bioinformatics

        bull Identify similarity between bio-sequences

        Computationally intensive

        bull Large number of pairwise alignment operations

        bull A BLAST running can take 700 ~ 1000 CPU hours

        bull Sequence databases growing exponentially

        bull GenBank doubled in size in about 15 months

        It is easy to parallelize BLAST

        bull Segment the input

        bull Segment processing (querying) is pleasingly parallel

        bull Segment the database (eg mpiBLAST)

        bull Needs special result reduction processing

        Large volume data

        bull A normal Blast database can be as large as 10GB

        bull 100 nodes means the peak storage bandwidth could reach to 1TB

        bull The output of BLAST is usually 10-100x larger than the input

        bull Parallel BLAST engine on Azure

        bull Query-segmentation data-parallel pattern bull split the input sequences

        bull query partitions in parallel

        bull merge results together when done

        bull Follows the general suggested application model bull Web Role + Queue + Worker

        bull With three special considerations bull Batch job management

        bull Task parallelism on an elastic Cloud

        Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

        on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

        A simple SplitJoin pattern

        Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

        bull 1248 for small middle large and extra large instance size

        Task granularity bull Large partition load imbalance

        bull Small partition unnecessary overheads bull NCBI-BLAST overhead

        bull Data transferring overhead

        Best Practice test runs to profiling and set size to mitigate the overhead

        Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

        bull too small repeated computation

        bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

        bull Estimate the value based on the number of pair-bases in the partition and test-runs

        bull Watch out for the 2-hour maximum limitation

        BLAST task

        Splitting task

        BLAST task

        BLAST task

        BLAST task

        hellip

        Merging Task

        Task size vs Performance

        bull Benefit of the warm cache effect

        bull 100 sequences per partition is the best choice

        Instance size vs Performance

        bull Super-linear speedup with larger size worker instances

        bull Primarily due to the memory capability

        Task SizeInstance Size vs Cost

        bull Extra-large instance generated the best and the most economical throughput

        bull Fully utilize the resource

        Web

        Portal

        Web

        Service

        Job registration

        Job Scheduler

        Worker

        Worker

        Worker

        Global

        dispatch

        queue

        Web Role

        Azure Table

        Job Management Role

        Azure Blob

        Database

        updating Role

        hellip

        Scaling Engine

        Blast databases

        temporary data etc)

        Job Registry NCBI databases

        BLAST task

        Splitting task

        BLAST task

        BLAST task

        BLAST task

        hellip

        Merging Task

        ASPNET program hosted by a web role instance bull Submit jobs

        bull Track jobrsquos status and logs

        AuthenticationAuthorization based on Live ID

        The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

        Web Portal

        Web Service

        Job registration

        Job Scheduler

        Job Portal

        Scaling Engine

        Job Registry

        R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

        Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

        bull Against ~5000 proteins from another strain completed in less than 30 sec

        AzureBLAST significantly saved computing timehellip

        Discovering Homologs bull Discover the interrelationships of known protein sequences

        ldquoAll against Allrdquo query bull The database is also the input query

        bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

        bull Theoretically 100 billion sequence comparisons

        Performance estimation bull Based on the sampling-running on one extra-large Azure instance

        bull Would require 3216731 minutes (61 years) on one desktop

        One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

        bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

        bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

        bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

        bull Each segment consists of smaller partitions

        bull When load imbalances redistribute the load manually

        5

        0

        62 6

        2 6

        2 6

        2 6

        2 5

        0 62

        bull Total size of the output result is ~230GB

        bull The number of total hits is 1764579487

        bull Started at March 25th the last task completed on April 8th (10 days compute)

        bull But based our estimates real working instance time should be 6~8 day

        bull Look into log data to analyze what took placehellip

        5

        0

        62 6

        2 6

        2 6

        2 6

        2 5

        0 62

        A normal log record should be

        Otherwise something is wrong (eg task failed to complete)

        3312010 614 RD00155D3611B0 Executing the task 251523

        3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

        3312010 625 RD00155D3611B0 Executing the task 251553

        3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

        3312010 644 RD00155D3611B0 Executing the task 251600

        3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

        3312010 822 RD00155D3611B0 Executing the task 251774

        3312010 950 RD00155D3611B0 Executing the task 251895

        3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

        North Europe Data Center totally 34256 tasks processed

        All 62 compute nodes lost tasks

        and then came back in a group

        This is an Update domain

        ~30 mins

        ~ 6 nodes in one group

        35 Nodes experience blob

        writing failure at same

        time

        West Europe Datacenter 30976 tasks are completed and job was killed

        A reasonable guess the

        Fault Domain is working

        MODISAzure Computing Evapotranspiration (ET) in The Cloud

        You never miss the water till the well has run dry Irish Proverb

        ET = Water volume evapotranspired (m3 s-1 m-2)

        Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

        λv = Latent heat of vaporization (Jg)

        Rn = Net radiation (W m-2)

        cp = Specific heat capacity of air (J kg-1 K-1)

        ρa = dry air density (kg m-3)

        δq = vapor pressure deficit (Pa)

        ga = Conductivity of air (inverse of ra) (m s-1)

        gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

        γ = Psychrometric constant (γ asymp 66 Pa K-1)

        Estimating resistanceconductivity across a

        catchment can be tricky

        bull Lots of inputs big data reduction

        bull Some of the inputs are not so simple

        119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

        (∆ + 120574 1 + 119892119886 119892119904 )120582120592

        Penman-Monteith (1964)

        Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

        NASA MODIS

        imagery source

        archives

        5 TB (600K files)

        FLUXNET curated

        sensor dataset

        (30GB 960 files)

        FLUXNET curated

        field dataset

        2 KB (1 file)

        NCEPNCAR

        ~100MB

        (4K files)

        Vegetative

        clumping

        ~5MB (1file)

        Climate

        classification

        ~1MB (1file)

        20 US year = 1 global year

        Data collection (map) stage

        bull Downloads requested input tiles

        from NASA ftp sites

        bull Includes geospatial lookup for

        non-sinusoidal tiles that will

        contribute to a reprojected

        sinusoidal tile

        Reprojection (map) stage

        bull Converts source tile(s) to

        intermediate result sinusoidal tiles

        bull Simple nearest neighbor or spline

        algorithms

        Derivation reduction stage

        bull First stage visible to scientist

        bull Computes ET in our initial use

        Analysis reduction stage

        bull Optional second stage visible to

        scientist

        bull Enables production of science

        analysis artifacts such as maps

        tables virtual sensors

        Reduction 1

        Queue

        Source

        Metadata

        AzureMODIS

        Service Web Role Portal

        Request

        Queue

        Scientific

        Results

        Download

        Data Collection Stage

        Source Imagery Download Sites

        Reprojection

        Queue

        Reduction 2

        Queue

        Download

        Queue

        Scientists

        Science results

        Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

        httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

        bull ModisAzure Service is the Web Role front door bull Receives all user requests

        bull Queues request to appropriate Download Reprojection or Reduction Job Queue

        bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

        recoverable units of work

        bull Execution status of all jobs and tasks persisted in Tables

        ltPipelineStagegt

        Request

        hellip ltPipelineStagegtJobStatus

        Persist ltPipelineStagegtJob Queue

        MODISAzure Service

        (Web Role)

        Service Monitor

        (Worker Role)

        Parse amp Persist ltPipelineStagegtTaskStatus

        hellip

        Dispatch

        ltPipelineStagegtTask Queue

        All work actually done by a Worker Role

        bull Sandboxes science or other executable

        bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

        Service Monitor

        (Worker Role)

        Parse amp Persist ltPipelineStagegtTaskStatus

        GenericWorker

        (Worker Role)

        hellip

        hellip

        Dispatch

        ltPipelineStagegtTask Queue

        hellip

        ltInputgtData Storage

        bull Dequeues tasks created by the Service Monitor

        bull Retries failed tasks 3 times

        bull Maintains all task status

        Reprojection Request

        hellip

        Service Monitor

        (Worker Role)

        ReprojectionJobStatus Persist

        Parse amp Persist ReprojectionTaskStatus

        GenericWorker

        (Worker Role)

        hellip

        Job Queue

        hellip

        Dispatch

        Task Queue

        Points to

        hellip

        ScanTimeList

        SwathGranuleMeta

        Reprojection Data

        Storage

        Each entity specifies a

        single reprojection job

        request

        Each entity specifies a

        single reprojection task (ie

        a single tile)

        Query this table to get

        geo-metadata (eg

        boundaries) for each swath

        tile

        Query this table to get the

        list of satellite scan times

        that cover a target tile

        Swath Source

        Data Storage

        bull Computational costs driven by data scale and need to run reduction multiple times

        bull Storage costs driven by data scale and 6 month project duration

        bull Small with respect to the people costs even at graduate student rates

        Reduction 1 Queue

        Source Metadata

        Request Queue

        Scientific Results Download

        Data Collection Stage

        Source Imagery Download Sites

        Reprojection Queue

        Reduction 2 Queue

        Download

        Queue

        Scientists

        Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

        400-500 GB

        60K files

        10 MBsec

        11 hours

        lt10 workers

        $50 upload

        $450 storage

        400 GB

        45K files

        3500 hours

        20-100

        workers

        5-7 GB

        55K files

        1800 hours

        20-100

        workers

        lt10 GB

        ~1K files

        1800 hours

        20-100

        workers

        $420 cpu

        $60 download

        $216 cpu

        $1 download

        $6 storage

        $216 cpu

        $2 download

        $9 storage

        AzureMODIS

        Service Web Role Portal

        Total $1420

        bull Clouds are the largest scale computer centers ever constructed and have the

        potential to be important to both large and small scale science problems

        bull Equally import they can increase participation in research providing needed

        resources to userscommunities without ready access

        bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

        support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

        latency applications do not perform optimally on clouds today

        bull Provide valuable fault tolerance and scalability abstractions

        bull Clouds as amplifier for familiar client tools and on premise compute

        bull Clouds services to support research provide considerable leverage for both

        individual researchers and entire communities of researchers

        • Day 2 - Azure as a PaaS
        • Day 2 - Applications

          5

          Web Application Model Comparison

          Machines Running IIS ASPNET

          Machines Running Windows Services

          Machines Running SQL Server

          Ad Hoc Application Model

          Web Role Instances Worker Role

          Instances

          Azure Storage Blob Queue Table

          SQL Azure

          Windows Azure Application Model

          Key Components Fabric Controller

          bull Manages hardware and virtual machines for service

          Compute bull Web Roles

          bull Web application front end

          bull Worker Roles bull Utility compute

          bull VM Roles bull Custom compute role bull You own and customize the VM

          Storage bull Blobs

          bull Binary objects

          bull Tables bull Entity storage

          bull Queues bull Role coordination

          bull SQL Azure bull SQL in the cloud

          Key Components Fabric Controller

          bull Think of it as an automated IT department

          bull ldquoCloud Layerrdquo on top of

          bull Windows Server 2008

          bull A custom version of Hyper-V called the Windows Azure Hypervisor

          bull Allows for automated management of virtual machines

          Key Components Fabric Controller

          bull Think of it as an automated IT department bull ldquoCloud Layerrdquo on top of

          bull Windows Server 2008

          bull A custom version of Hyper-V called the Windows Azure Hypervisor bull Allows for automated management of virtual machines

          bull Itrsquos job is to provision deploy monitor and maintain applications in data centers

          bull Applications have a ldquoshaperdquo and a ldquoconfigurationrdquo bull The configuration definition describes the shape of a service

          bull Role types

          bull Role VM sizes

          bull External and internal endpoints

          bull Local storage

          bull The configuration settings configures a service bull Instance count

          bull Storage keys

          bull Application-specific settings

          Key Components Fabric Controller

          bull Manages ldquonodesrdquo and ldquoedgesrdquo in the ldquofabricrdquo (the hardware) bull Power-on automation devices bull Routers Switches bull Hardware load balancers bull Physical servers bull Virtual servers

          bull State transitions bull Current State bull Goal State bull Does what is needed to reach and maintain the goal state

          bull Itrsquos a perfect IT employee bull Never sleeps bull Doesnrsquot ever ask for raise bull Always does what you tell it to do in configuration definition and settings

          Creating a New Project

          Windows Azure Compute

          Key Components ndash Compute Web Roles

          Web Front End

          bull Cloud web server

          bull Web pages

          bull Web services

          You can create the following types

          bull ASPNET web roles

          bull ASPNET MVC 2 web roles

          bull WCF service web roles

          bull Worker roles

          bull CGI-based web roles

          Key Components ndash Compute Worker Roles

          bull Utility compute

          bull Windows Server 2008

          bull Background processing

          bull Each role can define an amount of local storage

          bull Protected space on the local drive considered volatile storage

          bull May communicate with outside services

          bull Azure Storage

          bull SQL Azure

          bull Other Web services

          bull Can expose external and internal endpoints

          Suggested Application Model Using queues for reliable messaging

          Scalable Fault Tolerant Applications

          Queues are the application glue bull Decouple parts of application easier to scale independently bull Resource allocation different priority queues and backend servers bull Mask faults in worker roles (reliable messaging)

          Key Components ndash Compute VM Roles

          bull Customized Role

          bull You own the box

          bull How it works

          bull Download ldquoGuest OSrdquo to Server 2008 Hyper-V

          bull Customize the OS as you need to

          bull Upload the differences VHD

          bull Azure runs your VM role using

          bull Base OS

          bull Differences VHD

          Application Hosting

          lsquoGrokkingrsquo the service model

          bull Imagine white-boarding out your service architecture with boxes for nodes and arrows describing how they communicate

          bull The service model is the same diagram written down in a declarative format

          bull You give the Fabric the service model and the binaries that go with each of those nodes

          bull The Fabric can provision deploy and manage that diagram for you

          bull Find hardware home

          bull Copy and launch your app binaries

          bull Monitor your app and the hardware

          bull In case of failure take action Perhaps even relocate your app

          bull At all times the lsquodiagramrsquo stays whole

          Automated Service Management Provide code + service model

          bull Platform identifies and allocates resources deploys the service manages service health

          bull Configuration is handled by two files

          ServiceDefinitioncsdef

          ServiceConfigurationcscfg

          Service Definition

          Service Configuration

          GUI

          Double click on Role Name in Azure Project

          Deploying to the cloud

          bull We can deploy from the portal or from script

          bull VS builds two files

          bull Encrypted package of your code

          bull Your config file

          bull You must create an Azure account then a service and then you deploy your code

          bull Can take up to 20 minutes

          bull (which is better than six months)

          Service Management API

          bullREST based API to manage your services

          bullX509-certs for authentication

          bullLets you create delete change upgrade swaphellip

          bullLots of community and MSFT-built tools around the API

          - Easy to roll your own

          The Secret Sauce ndash The Fabric

          The Fabric is the lsquobrainrsquo behind Windows Azure

          1 Process service model

          1 Determine resource requirements

          2 Create role images

          2 Allocate resources

          3 Prepare nodes

          1 Place role images on nodes

          2 Configure settings

          3 Start roles

          4 Configure load balancers

          5 Maintain service health

          1 If role fails restart the role based on policy

          2 If node fails migrate the role based on policy

          Storage

          Durable Storage At Massive Scale

          Blob

          - Massive files eg videos logs

          Drive

          - Use standard file system APIs

          Tables - Non-relational but with few scale limits

          - Use SQL Azure for relational data

          Queues

          - Facilitate loosely-coupled reliable systems

          Blob Features and Functions

          bull Store Large Objects (up to 1TB in size)

          bull Can be served through Windows Azure CDN service

          bull Standard REST Interface

          bull PutBlob

          bull Inserts a new blob overwrites the existing blob

          bull GetBlob

          bull Get whole blob or a specific range

          bull DeleteBlob

          bull CopyBlob

          bull SnapshotBlob

          bull LeaseBlob

          Two Types of Blobs Under the Hood

          bull Block Blob

          bull Targeted at streaming workloads

          bull Each blob consists of a sequence of blocks bull Each block is identified by a Block ID

          bull Size limit 200GB per blob

          bull Page Blob

          bull Targeted at random readwrite workloads

          bull Each blob consists of an array of pages bull Each page is identified by its offset

          from the start of the blob

          bull Size limit 1TB per blob

          Windows Azure Drive

          bull Provides a durable NTFS volume for Windows Azure applications to use

          bull Use existing NTFS APIs to access a durable drive bull Durability and survival of data on application failover

          bull Enables migrating existing NTFS applications to the cloud

          bull A Windows Azure Drive is a Page Blob

          bull Example mount Page Blob as X bull httpltaccountnamegtblobcorewindowsnetltcontainernamegtltblobnamegt

          bull All writes to drive are made durable to the Page Blob bull Drive made durable through standard Page Blob replication

          bull Drive persists even when not mounted as a Page Blob

          Windows Azure Tables

          bull Provides Structured Storage

          bull Massively Scalable Tables bull Billions of entities (rows) and TBs of data

          bull Can use thousands of servers as traffic grows

          bull Highly Available amp Durable bull Data is replicated several times

          bull Familiar and Easy to use API

          bull WCF Data Services and OData bull NET classes and LINQ

          bull REST ndash with any platform or language

          Windows Azure Queues

          bull Queue are performance efficient highly available and provide reliable message delivery

          bull Simple asynchronous work dispatch

          bull Programming semantics ensure that a message can be processed at least once

          bull Access is provided via REST

          Storage Partitioning

          Understanding partitioning is key to understanding performance

          bull Different for each data type (blobs entities queues) Every data object has a partition key

          bull A partition can be served by a single server

          bull System load balances partitions based on traffic pattern

          bull Controls entity locality Partition key is unit of scale

          bull Load balancing can take a few minutes to kick in

          bull Can take a couple of seconds for partition to be available on a different server

          System load balances

          bull Use exponential backoff on ldquoServer Busyrdquo

          bull Our system load balances to meet your traffic needs

          bull Single partition limits have been reached Server Busy

          Partition Keys In Each Abstraction

          bull Entities w same PartitionKey value served from same partition Entities ndash TableName + PartitionKey

          PartitionKey (CustomerId) RowKey (RowKind)

          Name CreditCardNumber OrderTotal

          1 Customer-John Smith John Smith xxxx-xxxx-xxxx-xxxx

          1 Order ndash 1 $3512

          2 Customer-Bill Johnson Bill Johnson xxxx-xxxx-xxxx-xxxx

          2 Order ndash 3 $1000

          bull Every blob and its snapshots are in a single partition Blobs ndash Container name + Blob name

          bullAll messages for a single queue belong to the same partition Messages ndash Queue Name

          Container Name Blob Name

          image annarborbighousejpg

          image foxboroughgillettejpg

          video annarborbighousejpg

          Queue Message

          jobs Message1

          jobs Message2

          workflow Message1

          Scalability Targets

          Storage Account

          bull Capacity ndash Up to 100 TBs

          bull Transactions ndash Up to a few thousand requests per second

          bull Bandwidth ndash Up to a few hundred megabytes per second

          Single QueueTable Partition

          bull Up to 500 transactions per second

          To go above these numbers partition between multiple storage accounts and partitions

          When limit is hit app will see lsquo503 server busyrsquo applications should implement exponential backoff

          Single Blob Partition

          bull Throughput up to 60 MBs

          PartitionKey (Category)

          RowKey (Title)

          Timestamp ReleaseDate

          Action Fast amp Furious hellip 2009

          Action The Bourne Ultimatum hellip 2007

          hellip hellip hellip hellip

          Animation Open Season 2 hellip 2009

          Animation The Ant Bully hellip 2006

          PartitionKey (Category)

          RowKey (Title)

          Timestamp ReleaseDate

          Comedy Office Space hellip 1999

          hellip hellip hellip hellip

          SciFi X-Men Origins Wolverine hellip 2009

          hellip hellip hellip hellip

          War Defiance hellip 2008

          PartitionKey (Category)

          RowKey (Title)

          Timestamp ReleaseDate

          Action Fast amp Furious hellip 2009

          Action The Bourne Ultimatum hellip 2007

          hellip hellip hellip hellip

          Animation Open Season 2 hellip 2009

          Animation The Ant Bully hellip 2006

          hellip hellip hellip hellip

          Comedy Office Space hellip 1999

          hellip hellip hellip hellip

          SciFi X-Men Origins Wolverine hellip 2009

          hellip hellip hellip hellip

          War Defiance hellip 2008

          Partitions and Partition Ranges

          Key Selection Things to Consider

          bullDistribute load as much as possible bullHot partitions can be load balanced bullPartitionKey is critical for scalability

          See httpwwwmicrosoftpdccom2009SVC09 and httpazurescopecloudappnet for more information

          bull Avoid frequent large scans bull Parallelize queries bull Point queries are most efficient

          bullTransactions across a single partition bullTransaction semantics amp Reduce round trips

          Scalability

          Query Efficiency amp Speed

          Entity group transactions

          Expect Continuation Tokens ndash Seriously

          Maximum of 1000 rows in a response

          At the end of partition range boundary

          Maximum of 1000 rows in a response

          At the end of partition range boundary

          Maximum of 5 seconds to execute the query

          Tables Recap bullEfficient for frequently used queries

          bullSupports batch transactions

          bullDistributes load

          Select PartitionKey and RowKey that help scale

          Avoid ldquoAppend onlyrdquo patterns

          Always Handle continuation tokens

          ldquoORrdquo predicates are not optimized

          Implement back-off strategy for retries

          bullDistribute by using a hash etc as prefix

          bullExpect continuation tokens for range queries

          bullExecute the queries that form the ldquoORrdquo predicates as separate queries

          bullServer busy

          bullLoad balance partitions to meet traffic needs

          bullLoad on single partition has exceeded the limits

          WCF Data Services

          bullUse a new context for each logical operation

          bullAddObjectAttachTo can throw exception if entity is already being tracked

          bullPoint query throws an exception if resource does not exist Use IgnoreResourceNotFoundException

          Queues Their Unique Role in Building Reliable Scalable Applications

          bull Want roles that work closely together but are not bound together bull Tight coupling leads to brittleness

          bull This can aid in scaling and performance

          bull A queue can hold an unlimited number of messages bull Messages must be serializable as XML

          bull Limited to 8KB in size

          bull Commonly use the work ticket pattern

          bull Why not simply use a table

          Queue Terminology

          Message Lifecycle

          Queue

          Msg 1

          Msg 2

          Msg 3

          Msg 4

          Worker Role

          Worker Role

          PutMessage

          Web Role

          GetMessage (Timeout) RemoveMessage

          Msg 2 Msg 1

          Worker Role

          Msg 2

          POST httpmyaccountqueuecorewindowsnetmyqueuemessages

          HTTP11 200 OK Transfer-Encoding chunked Content-Type applicationxml Date Tue 09 Dec 2008 210430 GMT Server Nephos Queue Service Version 10 Microsoft-HTTPAPI20

          ltxml version=10 encoding=utf-8gt ltQueueMessagesListgt ltQueueMessagegt ltMessageIdgt5974b586-0df3-4e2d-ad0c-18e3892bfca2ltMessageIdgt ltInsertionTimegtMon 22 Sep 2008 232920 GMTltInsertionTimegt ltExpirationTimegtMon 29 Sep 2008 232920 GMTltExpirationTimegt ltPopReceiptgtYzQ4Yzg1MDIGM0MDFiZDAwYzEwltPopReceiptgt ltTimeNextVisiblegtTue 23 Sep 2008 052920GMTltTimeNextVisiblegt ltMessageTextgtPHRlc3Q+dGdGVzdD4=ltMessageTextgt ltQueueMessagegt ltQueueMessagesListgt

          DELETE httpmyaccountqueuecorewindowsnetmyqueuemessagesmessageidpopreceipt=YzQ4Yzg1MDIGM0MDFiZDAwYzEw

          Truncated Exponential Back Off Polling

          Consider a backoff polling approach Each empty poll

          increases interval by 2x

          A successful sets the interval back to 1

          44

          2 1

          1 1

          C1

          C2

          Removing Poison Messages

          1 1

          2 1

          3 4 0

          Producers Consumers

          P2

          P1

          3 0

          2 GetMessage(Q 30 s) msg 2

          1 GetMessage(Q 30 s) msg 1

          1 1

          2 1

          1 0

          2 0

          45

          C1

          C2

          Removing Poison Messages

          3 4 0

          Producers Consumers

          P2

          P1

          1 1

          2 1

          2 GetMessage(Q 30 s) msg 2 3 C2 consumed msg 2 4 DeleteMessage(Q msg 2) 7 GetMessage(Q 30 s) msg 1

          1 GetMessage(Q 30 s) msg 1 5 C1 crashed

          1 1

          2 1

          6 msg1 visible 30 s after Dequeue 3 0

          1 2

          1 1

          1 2

          46

          C1

          C2

          Removing Poison Messages

          3 4 0

          Producers Consumers

          P2

          P1

          1 2

          2 Dequeue(Q 30 sec) msg 2 3 C2 consumed msg 2 4 Delete(Q msg 2) 7 Dequeue(Q 30 sec) msg 1 8 C2 crashed

          1 Dequeue(Q 30 sec) msg 1 5 C1 crashed 10 C1 restarted 11 Dequeue(Q 30 sec) msg 1 12 DequeueCount gt 2 13 Delete (Q msg1) 1

          2

          6 msg1 visible 30s after Dequeue 9 msg1 visible 30s after Dequeue

          3 0

          1 3

          1 2

          1 3

          Queues Recap

          bullNo need to deal with failures Make message

          processing idempotent

          bullInvisible messages result in out of order Do not rely on order

          bullEnforce threshold on messagersquos dequeue count Use Dequeue count to remove

          poison messages

          bullMessages gt 8KB

          bullBatch messages

          bullGarbage collect orphaned blobs

          bullDynamically increasereduce workers

          Use blob to store message data with

          reference in message

          Use message count to scale

          bullNo need to deal with failures

          bullInvisible messages result in out of order

          bullEnforce threshold on messagersquos dequeue count

          bullDynamically increasereduce workers

          Windows Azure Storage Takeaways

          Blobs

          Drives

          Tables

          Queues

          httpblogsmsdncomwindowsazurestorage

          httpazurescopecloudappnet

          49

          A Quick Exercise

          hellipThen letrsquos look at some code and some tools

          50

          Code ndash AccountInformationcs public class AccountInformation private static string storageKey = ldquotHiSiSnOtMyKeY private static string accountName = jjstore private static StorageCredentialsAccountAndKey credentials internal static StorageCredentialsAccountAndKey Credentials get if (credentials == null) credentials = new StorageCredentialsAccountAndKey(accountName storageKey) return credentials

          51

          Code ndash BlobHelpercs public class BlobHelper private static string defaultContainerName = school private CloudBlobClient client = null private CloudBlobContainer container = null private void InitContainer() if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudBlobClient() container = clientGetContainerReference(defaultContainerName) containerCreateIfNotExist() BlobContainerPermissions permissions = containerGetPermissions() permissionsPublicAccess = BlobContainerPublicAccessTypeContainer containerSetPermissions(permissions)

          52

          Code ndash BlobHelpercs

          public void WriteFileToBlob(string filePath) if (client == null || container == null) InitContainer() FileInfo file = new FileInfo(filePath) CloudBlob blob = containerGetBlobReference(fileName) blobPropertiesContentType = GetContentType(fileExtension) blobUploadFile(fileFullName) Or if you want to write a string replace the last line with blobUploadText(someString) And make sure you set the content type to the appropriate MIME type (eg ldquotextplainrdquo)

          53

          Code ndash BlobHelpercs

          public string GetBlobText(string blobName) if (client == null || container == null) InitContainer() CloudBlob blob = containerGetBlobReference(blobName) try return blobDownloadText() catch (Exception) The blob probably does not exist or there is no connection available return null

          54

          Application Code - Blobs private void SaveToCloudButton_Click(object sender RoutedEventArgs e) StringBuilder buff = new StringBuilder() buffAppendLine(LastNameFirstNameEmailBirthdayNativeLanguageFavoriteIceCreamYearsInPhDGraduated) foreach (AttendeeEntity attendee in attendees) buffAppendLine(attendeeToCsvString()) blobHelperWriteStringToBlob(SummerSchoolAttendeestxt buffToString())

          The blob is now available at httpltAccountNamegtblobcorewindowsnetltContainerNamegtltBlobNamegt Or in this case httpjjstoreblobcorewindowsnetschoolSummerSchoolAttendeestxt

          55

          Code - TableEntities using MicrosoftWindowsAzureStorageClient public class AttendeeEntity TableServiceEntity public string FirstName get set public string LastName get set public string Email get set public DateTime Birthday get set public string FavoriteIceCream get set public int YearsInPhD get set public bool Graduated get set hellip

          56

          Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

          57

          Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

          58

          Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

          59

          Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

          60

          Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

          Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

          61

          Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

          62

          Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

          Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

          63

          Tools ndash Fiddler2

          Best Practices

          Picking the Right VM Size

          bull Having the correct VM size can make a big difference in costs

          bull Fundamental choice ndash larger fewer VMs vs many smaller instances

          bull If you scale better than linear across cores larger VMs could save you money

          bull Pretty rare to see linear scaling across 8 cores

          bull More instances may provide better uptime and reliability (more failures needed to take your service down)

          bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

          Using Your VM to the Maximum

          Remember bull 1 role instance == 1 VM running Windows

          bull 1 role instance = one specific task for your code

          bull Yoursquore paying for the entire VM so why not use it

          bull Common mistake ndash split up code into multiple roles each not using up CPU

          bull Balance between using up CPU vs having free capacity in times of need

          bull Multiple ways to use your CPU to the fullest

          Exploiting Concurrency

          bull Spin up additional processes each with a specific task or as a unit of concurrency

          bull May not be ideal if number of active processes exceeds number of cores

          bull Use multithreading aggressively

          bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

          bull In NET 4 use the Task Parallel Library

          bull Data parallelism

          bull Task parallelism

          Finding Good Code Neighbors

          bull Typically code falls into one or more of these categories

          bull Find code that is intensive with different resources to live together

          bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

          Memory Intensive

          CPU Intensive

          Network IO Intensive

          Storage IO Intensive

          Scaling Appropriately

          bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

          bull Spinning VMs up and down automatically is good at large scale

          bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

          bull Being too aggressive in spinning down VMs can result in poor user experience

          bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

          Performance Cost

          Storage Costs

          bull Understand an applicationrsquos storage profile and how storage billing works

          bull Make service choices based on your app profile

          bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

          bull Service choice can make a big cost difference based on your app profile

          bull Caching and compressing They help a lot with storage costs

          Saving Bandwidth Costs

          Bandwidth costs are a huge part of any popular web apprsquos billing profile

          Sending fewer things over the wire often means getting fewer things from storage

          Saving bandwidth costs often lead to savings in other places

          Sending fewer things means your VM has time to do other tasks

          All of these tips have the side benefit of improving your web apprsquos performance and user experience

          Compressing Content

          1 Gzip all output content

          bull All modern browsers can decompress on the fly

          bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

          2Tradeoff compute costs for storage size

          3Minimize image sizes

          bull Use Portable Network Graphics (PNGs)

          bull Crush your PNGs

          bull Strip needless metadata

          bull Make all PNGs palette PNGs

          Uncompressed

          Content

          Compressed

          Content

          Gzip

          Minify JavaScript

          Minify CCS

          Minify Images

          Best Practices Summary

          Doing lsquolessrsquo is the key to saving costs

          Measure everything

          Know your application profile in and out

          Research Examples in the Cloud

          hellipon another set of slides

          Map Reduce on Azure

          bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

          bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

          76

          Project Daytona - Map Reduce on Azure

          httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

          77

          Questions and Discussionhellip

          Thank you for hosting me at the Summer School

          BLAST (Basic Local Alignment Search Tool)

          bull The most important software in bioinformatics

          bull Identify similarity between bio-sequences

          Computationally intensive

          bull Large number of pairwise alignment operations

          bull A BLAST running can take 700 ~ 1000 CPU hours

          bull Sequence databases growing exponentially

          bull GenBank doubled in size in about 15 months

          It is easy to parallelize BLAST

          bull Segment the input

          bull Segment processing (querying) is pleasingly parallel

          bull Segment the database (eg mpiBLAST)

          bull Needs special result reduction processing

          Large volume data

          bull A normal Blast database can be as large as 10GB

          bull 100 nodes means the peak storage bandwidth could reach to 1TB

          bull The output of BLAST is usually 10-100x larger than the input

          bull Parallel BLAST engine on Azure

          bull Query-segmentation data-parallel pattern bull split the input sequences

          bull query partitions in parallel

          bull merge results together when done

          bull Follows the general suggested application model bull Web Role + Queue + Worker

          bull With three special considerations bull Batch job management

          bull Task parallelism on an elastic Cloud

          Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

          on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

          A simple SplitJoin pattern

          Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

          bull 1248 for small middle large and extra large instance size

          Task granularity bull Large partition load imbalance

          bull Small partition unnecessary overheads bull NCBI-BLAST overhead

          bull Data transferring overhead

          Best Practice test runs to profiling and set size to mitigate the overhead

          Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

          bull too small repeated computation

          bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

          bull Estimate the value based on the number of pair-bases in the partition and test-runs

          bull Watch out for the 2-hour maximum limitation

          BLAST task

          Splitting task

          BLAST task

          BLAST task

          BLAST task

          hellip

          Merging Task

          Task size vs Performance

          bull Benefit of the warm cache effect

          bull 100 sequences per partition is the best choice

          Instance size vs Performance

          bull Super-linear speedup with larger size worker instances

          bull Primarily due to the memory capability

          Task SizeInstance Size vs Cost

          bull Extra-large instance generated the best and the most economical throughput

          bull Fully utilize the resource

          Web

          Portal

          Web

          Service

          Job registration

          Job Scheduler

          Worker

          Worker

          Worker

          Global

          dispatch

          queue

          Web Role

          Azure Table

          Job Management Role

          Azure Blob

          Database

          updating Role

          hellip

          Scaling Engine

          Blast databases

          temporary data etc)

          Job Registry NCBI databases

          BLAST task

          Splitting task

          BLAST task

          BLAST task

          BLAST task

          hellip

          Merging Task

          ASPNET program hosted by a web role instance bull Submit jobs

          bull Track jobrsquos status and logs

          AuthenticationAuthorization based on Live ID

          The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

          Web Portal

          Web Service

          Job registration

          Job Scheduler

          Job Portal

          Scaling Engine

          Job Registry

          R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

          Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

          bull Against ~5000 proteins from another strain completed in less than 30 sec

          AzureBLAST significantly saved computing timehellip

          Discovering Homologs bull Discover the interrelationships of known protein sequences

          ldquoAll against Allrdquo query bull The database is also the input query

          bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

          bull Theoretically 100 billion sequence comparisons

          Performance estimation bull Based on the sampling-running on one extra-large Azure instance

          bull Would require 3216731 minutes (61 years) on one desktop

          One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

          bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

          bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

          bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

          bull Each segment consists of smaller partitions

          bull When load imbalances redistribute the load manually

          5

          0

          62 6

          2 6

          2 6

          2 6

          2 5

          0 62

          bull Total size of the output result is ~230GB

          bull The number of total hits is 1764579487

          bull Started at March 25th the last task completed on April 8th (10 days compute)

          bull But based our estimates real working instance time should be 6~8 day

          bull Look into log data to analyze what took placehellip

          5

          0

          62 6

          2 6

          2 6

          2 6

          2 5

          0 62

          A normal log record should be

          Otherwise something is wrong (eg task failed to complete)

          3312010 614 RD00155D3611B0 Executing the task 251523

          3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

          3312010 625 RD00155D3611B0 Executing the task 251553

          3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

          3312010 644 RD00155D3611B0 Executing the task 251600

          3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

          3312010 822 RD00155D3611B0 Executing the task 251774

          3312010 950 RD00155D3611B0 Executing the task 251895

          3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

          North Europe Data Center totally 34256 tasks processed

          All 62 compute nodes lost tasks

          and then came back in a group

          This is an Update domain

          ~30 mins

          ~ 6 nodes in one group

          35 Nodes experience blob

          writing failure at same

          time

          West Europe Datacenter 30976 tasks are completed and job was killed

          A reasonable guess the

          Fault Domain is working

          MODISAzure Computing Evapotranspiration (ET) in The Cloud

          You never miss the water till the well has run dry Irish Proverb

          ET = Water volume evapotranspired (m3 s-1 m-2)

          Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

          λv = Latent heat of vaporization (Jg)

          Rn = Net radiation (W m-2)

          cp = Specific heat capacity of air (J kg-1 K-1)

          ρa = dry air density (kg m-3)

          δq = vapor pressure deficit (Pa)

          ga = Conductivity of air (inverse of ra) (m s-1)

          gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

          γ = Psychrometric constant (γ asymp 66 Pa K-1)

          Estimating resistanceconductivity across a

          catchment can be tricky

          bull Lots of inputs big data reduction

          bull Some of the inputs are not so simple

          119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

          (∆ + 120574 1 + 119892119886 119892119904 )120582120592

          Penman-Monteith (1964)

          Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

          NASA MODIS

          imagery source

          archives

          5 TB (600K files)

          FLUXNET curated

          sensor dataset

          (30GB 960 files)

          FLUXNET curated

          field dataset

          2 KB (1 file)

          NCEPNCAR

          ~100MB

          (4K files)

          Vegetative

          clumping

          ~5MB (1file)

          Climate

          classification

          ~1MB (1file)

          20 US year = 1 global year

          Data collection (map) stage

          bull Downloads requested input tiles

          from NASA ftp sites

          bull Includes geospatial lookup for

          non-sinusoidal tiles that will

          contribute to a reprojected

          sinusoidal tile

          Reprojection (map) stage

          bull Converts source tile(s) to

          intermediate result sinusoidal tiles

          bull Simple nearest neighbor or spline

          algorithms

          Derivation reduction stage

          bull First stage visible to scientist

          bull Computes ET in our initial use

          Analysis reduction stage

          bull Optional second stage visible to

          scientist

          bull Enables production of science

          analysis artifacts such as maps

          tables virtual sensors

          Reduction 1

          Queue

          Source

          Metadata

          AzureMODIS

          Service Web Role Portal

          Request

          Queue

          Scientific

          Results

          Download

          Data Collection Stage

          Source Imagery Download Sites

          Reprojection

          Queue

          Reduction 2

          Queue

          Download

          Queue

          Scientists

          Science results

          Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

          httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

          bull ModisAzure Service is the Web Role front door bull Receives all user requests

          bull Queues request to appropriate Download Reprojection or Reduction Job Queue

          bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

          recoverable units of work

          bull Execution status of all jobs and tasks persisted in Tables

          ltPipelineStagegt

          Request

          hellip ltPipelineStagegtJobStatus

          Persist ltPipelineStagegtJob Queue

          MODISAzure Service

          (Web Role)

          Service Monitor

          (Worker Role)

          Parse amp Persist ltPipelineStagegtTaskStatus

          hellip

          Dispatch

          ltPipelineStagegtTask Queue

          All work actually done by a Worker Role

          bull Sandboxes science or other executable

          bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

          Service Monitor

          (Worker Role)

          Parse amp Persist ltPipelineStagegtTaskStatus

          GenericWorker

          (Worker Role)

          hellip

          hellip

          Dispatch

          ltPipelineStagegtTask Queue

          hellip

          ltInputgtData Storage

          bull Dequeues tasks created by the Service Monitor

          bull Retries failed tasks 3 times

          bull Maintains all task status

          Reprojection Request

          hellip

          Service Monitor

          (Worker Role)

          ReprojectionJobStatus Persist

          Parse amp Persist ReprojectionTaskStatus

          GenericWorker

          (Worker Role)

          hellip

          Job Queue

          hellip

          Dispatch

          Task Queue

          Points to

          hellip

          ScanTimeList

          SwathGranuleMeta

          Reprojection Data

          Storage

          Each entity specifies a

          single reprojection job

          request

          Each entity specifies a

          single reprojection task (ie

          a single tile)

          Query this table to get

          geo-metadata (eg

          boundaries) for each swath

          tile

          Query this table to get the

          list of satellite scan times

          that cover a target tile

          Swath Source

          Data Storage

          bull Computational costs driven by data scale and need to run reduction multiple times

          bull Storage costs driven by data scale and 6 month project duration

          bull Small with respect to the people costs even at graduate student rates

          Reduction 1 Queue

          Source Metadata

          Request Queue

          Scientific Results Download

          Data Collection Stage

          Source Imagery Download Sites

          Reprojection Queue

          Reduction 2 Queue

          Download

          Queue

          Scientists

          Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

          400-500 GB

          60K files

          10 MBsec

          11 hours

          lt10 workers

          $50 upload

          $450 storage

          400 GB

          45K files

          3500 hours

          20-100

          workers

          5-7 GB

          55K files

          1800 hours

          20-100

          workers

          lt10 GB

          ~1K files

          1800 hours

          20-100

          workers

          $420 cpu

          $60 download

          $216 cpu

          $1 download

          $6 storage

          $216 cpu

          $2 download

          $9 storage

          AzureMODIS

          Service Web Role Portal

          Total $1420

          bull Clouds are the largest scale computer centers ever constructed and have the

          potential to be important to both large and small scale science problems

          bull Equally import they can increase participation in research providing needed

          resources to userscommunities without ready access

          bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

          support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

          latency applications do not perform optimally on clouds today

          bull Provide valuable fault tolerance and scalability abstractions

          bull Clouds as amplifier for familiar client tools and on premise compute

          bull Clouds services to support research provide considerable leverage for both

          individual researchers and entire communities of researchers

          • Day 2 - Azure as a PaaS
          • Day 2 - Applications

            Key Components Fabric Controller

            bull Manages hardware and virtual machines for service

            Compute bull Web Roles

            bull Web application front end

            bull Worker Roles bull Utility compute

            bull VM Roles bull Custom compute role bull You own and customize the VM

            Storage bull Blobs

            bull Binary objects

            bull Tables bull Entity storage

            bull Queues bull Role coordination

            bull SQL Azure bull SQL in the cloud

            Key Components Fabric Controller

            bull Think of it as an automated IT department

            bull ldquoCloud Layerrdquo on top of

            bull Windows Server 2008

            bull A custom version of Hyper-V called the Windows Azure Hypervisor

            bull Allows for automated management of virtual machines

            Key Components Fabric Controller

            bull Think of it as an automated IT department bull ldquoCloud Layerrdquo on top of

            bull Windows Server 2008

            bull A custom version of Hyper-V called the Windows Azure Hypervisor bull Allows for automated management of virtual machines

            bull Itrsquos job is to provision deploy monitor and maintain applications in data centers

            bull Applications have a ldquoshaperdquo and a ldquoconfigurationrdquo bull The configuration definition describes the shape of a service

            bull Role types

            bull Role VM sizes

            bull External and internal endpoints

            bull Local storage

            bull The configuration settings configures a service bull Instance count

            bull Storage keys

            bull Application-specific settings

            Key Components Fabric Controller

            bull Manages ldquonodesrdquo and ldquoedgesrdquo in the ldquofabricrdquo (the hardware) bull Power-on automation devices bull Routers Switches bull Hardware load balancers bull Physical servers bull Virtual servers

            bull State transitions bull Current State bull Goal State bull Does what is needed to reach and maintain the goal state

            bull Itrsquos a perfect IT employee bull Never sleeps bull Doesnrsquot ever ask for raise bull Always does what you tell it to do in configuration definition and settings

            Creating a New Project

            Windows Azure Compute

            Key Components ndash Compute Web Roles

            Web Front End

            bull Cloud web server

            bull Web pages

            bull Web services

            You can create the following types

            bull ASPNET web roles

            bull ASPNET MVC 2 web roles

            bull WCF service web roles

            bull Worker roles

            bull CGI-based web roles

            Key Components ndash Compute Worker Roles

            bull Utility compute

            bull Windows Server 2008

            bull Background processing

            bull Each role can define an amount of local storage

            bull Protected space on the local drive considered volatile storage

            bull May communicate with outside services

            bull Azure Storage

            bull SQL Azure

            bull Other Web services

            bull Can expose external and internal endpoints

            Suggested Application Model Using queues for reliable messaging

            Scalable Fault Tolerant Applications

            Queues are the application glue bull Decouple parts of application easier to scale independently bull Resource allocation different priority queues and backend servers bull Mask faults in worker roles (reliable messaging)

            Key Components ndash Compute VM Roles

            bull Customized Role

            bull You own the box

            bull How it works

            bull Download ldquoGuest OSrdquo to Server 2008 Hyper-V

            bull Customize the OS as you need to

            bull Upload the differences VHD

            bull Azure runs your VM role using

            bull Base OS

            bull Differences VHD

            Application Hosting

            lsquoGrokkingrsquo the service model

            bull Imagine white-boarding out your service architecture with boxes for nodes and arrows describing how they communicate

            bull The service model is the same diagram written down in a declarative format

            bull You give the Fabric the service model and the binaries that go with each of those nodes

            bull The Fabric can provision deploy and manage that diagram for you

            bull Find hardware home

            bull Copy and launch your app binaries

            bull Monitor your app and the hardware

            bull In case of failure take action Perhaps even relocate your app

            bull At all times the lsquodiagramrsquo stays whole

            Automated Service Management Provide code + service model

            bull Platform identifies and allocates resources deploys the service manages service health

            bull Configuration is handled by two files

            ServiceDefinitioncsdef

            ServiceConfigurationcscfg

            Service Definition

            Service Configuration

            GUI

            Double click on Role Name in Azure Project

            Deploying to the cloud

            bull We can deploy from the portal or from script

            bull VS builds two files

            bull Encrypted package of your code

            bull Your config file

            bull You must create an Azure account then a service and then you deploy your code

            bull Can take up to 20 minutes

            bull (which is better than six months)

            Service Management API

            bullREST based API to manage your services

            bullX509-certs for authentication

            bullLets you create delete change upgrade swaphellip

            bullLots of community and MSFT-built tools around the API

            - Easy to roll your own

            The Secret Sauce ndash The Fabric

            The Fabric is the lsquobrainrsquo behind Windows Azure

            1 Process service model

            1 Determine resource requirements

            2 Create role images

            2 Allocate resources

            3 Prepare nodes

            1 Place role images on nodes

            2 Configure settings

            3 Start roles

            4 Configure load balancers

            5 Maintain service health

            1 If role fails restart the role based on policy

            2 If node fails migrate the role based on policy

            Storage

            Durable Storage At Massive Scale

            Blob

            - Massive files eg videos logs

            Drive

            - Use standard file system APIs

            Tables - Non-relational but with few scale limits

            - Use SQL Azure for relational data

            Queues

            - Facilitate loosely-coupled reliable systems

            Blob Features and Functions

            bull Store Large Objects (up to 1TB in size)

            bull Can be served through Windows Azure CDN service

            bull Standard REST Interface

            bull PutBlob

            bull Inserts a new blob overwrites the existing blob

            bull GetBlob

            bull Get whole blob or a specific range

            bull DeleteBlob

            bull CopyBlob

            bull SnapshotBlob

            bull LeaseBlob

            Two Types of Blobs Under the Hood

            bull Block Blob

            bull Targeted at streaming workloads

            bull Each blob consists of a sequence of blocks bull Each block is identified by a Block ID

            bull Size limit 200GB per blob

            bull Page Blob

            bull Targeted at random readwrite workloads

            bull Each blob consists of an array of pages bull Each page is identified by its offset

            from the start of the blob

            bull Size limit 1TB per blob

            Windows Azure Drive

            bull Provides a durable NTFS volume for Windows Azure applications to use

            bull Use existing NTFS APIs to access a durable drive bull Durability and survival of data on application failover

            bull Enables migrating existing NTFS applications to the cloud

            bull A Windows Azure Drive is a Page Blob

            bull Example mount Page Blob as X bull httpltaccountnamegtblobcorewindowsnetltcontainernamegtltblobnamegt

            bull All writes to drive are made durable to the Page Blob bull Drive made durable through standard Page Blob replication

            bull Drive persists even when not mounted as a Page Blob

            Windows Azure Tables

            bull Provides Structured Storage

            bull Massively Scalable Tables bull Billions of entities (rows) and TBs of data

            bull Can use thousands of servers as traffic grows

            bull Highly Available amp Durable bull Data is replicated several times

            bull Familiar and Easy to use API

            bull WCF Data Services and OData bull NET classes and LINQ

            bull REST ndash with any platform or language

            Windows Azure Queues

            bull Queue are performance efficient highly available and provide reliable message delivery

            bull Simple asynchronous work dispatch

            bull Programming semantics ensure that a message can be processed at least once

            bull Access is provided via REST

            Storage Partitioning

            Understanding partitioning is key to understanding performance

            bull Different for each data type (blobs entities queues) Every data object has a partition key

            bull A partition can be served by a single server

            bull System load balances partitions based on traffic pattern

            bull Controls entity locality Partition key is unit of scale

            bull Load balancing can take a few minutes to kick in

            bull Can take a couple of seconds for partition to be available on a different server

            System load balances

            bull Use exponential backoff on ldquoServer Busyrdquo

            bull Our system load balances to meet your traffic needs

            bull Single partition limits have been reached Server Busy

            Partition Keys In Each Abstraction

            bull Entities w same PartitionKey value served from same partition Entities ndash TableName + PartitionKey

            PartitionKey (CustomerId) RowKey (RowKind)

            Name CreditCardNumber OrderTotal

            1 Customer-John Smith John Smith xxxx-xxxx-xxxx-xxxx

            1 Order ndash 1 $3512

            2 Customer-Bill Johnson Bill Johnson xxxx-xxxx-xxxx-xxxx

            2 Order ndash 3 $1000

            bull Every blob and its snapshots are in a single partition Blobs ndash Container name + Blob name

            bullAll messages for a single queue belong to the same partition Messages ndash Queue Name

            Container Name Blob Name

            image annarborbighousejpg

            image foxboroughgillettejpg

            video annarborbighousejpg

            Queue Message

            jobs Message1

            jobs Message2

            workflow Message1

            Scalability Targets

            Storage Account

            bull Capacity ndash Up to 100 TBs

            bull Transactions ndash Up to a few thousand requests per second

            bull Bandwidth ndash Up to a few hundred megabytes per second

            Single QueueTable Partition

            bull Up to 500 transactions per second

            To go above these numbers partition between multiple storage accounts and partitions

            When limit is hit app will see lsquo503 server busyrsquo applications should implement exponential backoff

            Single Blob Partition

            bull Throughput up to 60 MBs

            PartitionKey (Category)

            RowKey (Title)

            Timestamp ReleaseDate

            Action Fast amp Furious hellip 2009

            Action The Bourne Ultimatum hellip 2007

            hellip hellip hellip hellip

            Animation Open Season 2 hellip 2009

            Animation The Ant Bully hellip 2006

            PartitionKey (Category)

            RowKey (Title)

            Timestamp ReleaseDate

            Comedy Office Space hellip 1999

            hellip hellip hellip hellip

            SciFi X-Men Origins Wolverine hellip 2009

            hellip hellip hellip hellip

            War Defiance hellip 2008

            PartitionKey (Category)

            RowKey (Title)

            Timestamp ReleaseDate

            Action Fast amp Furious hellip 2009

            Action The Bourne Ultimatum hellip 2007

            hellip hellip hellip hellip

            Animation Open Season 2 hellip 2009

            Animation The Ant Bully hellip 2006

            hellip hellip hellip hellip

            Comedy Office Space hellip 1999

            hellip hellip hellip hellip

            SciFi X-Men Origins Wolverine hellip 2009

            hellip hellip hellip hellip

            War Defiance hellip 2008

            Partitions and Partition Ranges

            Key Selection Things to Consider

            bullDistribute load as much as possible bullHot partitions can be load balanced bullPartitionKey is critical for scalability

            See httpwwwmicrosoftpdccom2009SVC09 and httpazurescopecloudappnet for more information

            bull Avoid frequent large scans bull Parallelize queries bull Point queries are most efficient

            bullTransactions across a single partition bullTransaction semantics amp Reduce round trips

            Scalability

            Query Efficiency amp Speed

            Entity group transactions

            Expect Continuation Tokens ndash Seriously

            Maximum of 1000 rows in a response

            At the end of partition range boundary

            Maximum of 1000 rows in a response

            At the end of partition range boundary

            Maximum of 5 seconds to execute the query

            Tables Recap bullEfficient for frequently used queries

            bullSupports batch transactions

            bullDistributes load

            Select PartitionKey and RowKey that help scale

            Avoid ldquoAppend onlyrdquo patterns

            Always Handle continuation tokens

            ldquoORrdquo predicates are not optimized

            Implement back-off strategy for retries

            bullDistribute by using a hash etc as prefix

            bullExpect continuation tokens for range queries

            bullExecute the queries that form the ldquoORrdquo predicates as separate queries

            bullServer busy

            bullLoad balance partitions to meet traffic needs

            bullLoad on single partition has exceeded the limits

            WCF Data Services

            bullUse a new context for each logical operation

            bullAddObjectAttachTo can throw exception if entity is already being tracked

            bullPoint query throws an exception if resource does not exist Use IgnoreResourceNotFoundException

            Queues Their Unique Role in Building Reliable Scalable Applications

            bull Want roles that work closely together but are not bound together bull Tight coupling leads to brittleness

            bull This can aid in scaling and performance

            bull A queue can hold an unlimited number of messages bull Messages must be serializable as XML

            bull Limited to 8KB in size

            bull Commonly use the work ticket pattern

            bull Why not simply use a table

            Queue Terminology

            Message Lifecycle

            Queue

            Msg 1

            Msg 2

            Msg 3

            Msg 4

            Worker Role

            Worker Role

            PutMessage

            Web Role

            GetMessage (Timeout) RemoveMessage

            Msg 2 Msg 1

            Worker Role

            Msg 2

            POST httpmyaccountqueuecorewindowsnetmyqueuemessages

            HTTP11 200 OK Transfer-Encoding chunked Content-Type applicationxml Date Tue 09 Dec 2008 210430 GMT Server Nephos Queue Service Version 10 Microsoft-HTTPAPI20

            ltxml version=10 encoding=utf-8gt ltQueueMessagesListgt ltQueueMessagegt ltMessageIdgt5974b586-0df3-4e2d-ad0c-18e3892bfca2ltMessageIdgt ltInsertionTimegtMon 22 Sep 2008 232920 GMTltInsertionTimegt ltExpirationTimegtMon 29 Sep 2008 232920 GMTltExpirationTimegt ltPopReceiptgtYzQ4Yzg1MDIGM0MDFiZDAwYzEwltPopReceiptgt ltTimeNextVisiblegtTue 23 Sep 2008 052920GMTltTimeNextVisiblegt ltMessageTextgtPHRlc3Q+dGdGVzdD4=ltMessageTextgt ltQueueMessagegt ltQueueMessagesListgt

            DELETE httpmyaccountqueuecorewindowsnetmyqueuemessagesmessageidpopreceipt=YzQ4Yzg1MDIGM0MDFiZDAwYzEw

            Truncated Exponential Back Off Polling

            Consider a backoff polling approach Each empty poll

            increases interval by 2x

            A successful sets the interval back to 1

            44

            2 1

            1 1

            C1

            C2

            Removing Poison Messages

            1 1

            2 1

            3 4 0

            Producers Consumers

            P2

            P1

            3 0

            2 GetMessage(Q 30 s) msg 2

            1 GetMessage(Q 30 s) msg 1

            1 1

            2 1

            1 0

            2 0

            45

            C1

            C2

            Removing Poison Messages

            3 4 0

            Producers Consumers

            P2

            P1

            1 1

            2 1

            2 GetMessage(Q 30 s) msg 2 3 C2 consumed msg 2 4 DeleteMessage(Q msg 2) 7 GetMessage(Q 30 s) msg 1

            1 GetMessage(Q 30 s) msg 1 5 C1 crashed

            1 1

            2 1

            6 msg1 visible 30 s after Dequeue 3 0

            1 2

            1 1

            1 2

            46

            C1

            C2

            Removing Poison Messages

            3 4 0

            Producers Consumers

            P2

            P1

            1 2

            2 Dequeue(Q 30 sec) msg 2 3 C2 consumed msg 2 4 Delete(Q msg 2) 7 Dequeue(Q 30 sec) msg 1 8 C2 crashed

            1 Dequeue(Q 30 sec) msg 1 5 C1 crashed 10 C1 restarted 11 Dequeue(Q 30 sec) msg 1 12 DequeueCount gt 2 13 Delete (Q msg1) 1

            2

            6 msg1 visible 30s after Dequeue 9 msg1 visible 30s after Dequeue

            3 0

            1 3

            1 2

            1 3

            Queues Recap

            bullNo need to deal with failures Make message

            processing idempotent

            bullInvisible messages result in out of order Do not rely on order

            bullEnforce threshold on messagersquos dequeue count Use Dequeue count to remove

            poison messages

            bullMessages gt 8KB

            bullBatch messages

            bullGarbage collect orphaned blobs

            bullDynamically increasereduce workers

            Use blob to store message data with

            reference in message

            Use message count to scale

            bullNo need to deal with failures

            bullInvisible messages result in out of order

            bullEnforce threshold on messagersquos dequeue count

            bullDynamically increasereduce workers

            Windows Azure Storage Takeaways

            Blobs

            Drives

            Tables

            Queues

            httpblogsmsdncomwindowsazurestorage

            httpazurescopecloudappnet

            49

            A Quick Exercise

            hellipThen letrsquos look at some code and some tools

            50

            Code ndash AccountInformationcs public class AccountInformation private static string storageKey = ldquotHiSiSnOtMyKeY private static string accountName = jjstore private static StorageCredentialsAccountAndKey credentials internal static StorageCredentialsAccountAndKey Credentials get if (credentials == null) credentials = new StorageCredentialsAccountAndKey(accountName storageKey) return credentials

            51

            Code ndash BlobHelpercs public class BlobHelper private static string defaultContainerName = school private CloudBlobClient client = null private CloudBlobContainer container = null private void InitContainer() if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudBlobClient() container = clientGetContainerReference(defaultContainerName) containerCreateIfNotExist() BlobContainerPermissions permissions = containerGetPermissions() permissionsPublicAccess = BlobContainerPublicAccessTypeContainer containerSetPermissions(permissions)

            52

            Code ndash BlobHelpercs

            public void WriteFileToBlob(string filePath) if (client == null || container == null) InitContainer() FileInfo file = new FileInfo(filePath) CloudBlob blob = containerGetBlobReference(fileName) blobPropertiesContentType = GetContentType(fileExtension) blobUploadFile(fileFullName) Or if you want to write a string replace the last line with blobUploadText(someString) And make sure you set the content type to the appropriate MIME type (eg ldquotextplainrdquo)

            53

            Code ndash BlobHelpercs

            public string GetBlobText(string blobName) if (client == null || container == null) InitContainer() CloudBlob blob = containerGetBlobReference(blobName) try return blobDownloadText() catch (Exception) The blob probably does not exist or there is no connection available return null

            54

            Application Code - Blobs private void SaveToCloudButton_Click(object sender RoutedEventArgs e) StringBuilder buff = new StringBuilder() buffAppendLine(LastNameFirstNameEmailBirthdayNativeLanguageFavoriteIceCreamYearsInPhDGraduated) foreach (AttendeeEntity attendee in attendees) buffAppendLine(attendeeToCsvString()) blobHelperWriteStringToBlob(SummerSchoolAttendeestxt buffToString())

            The blob is now available at httpltAccountNamegtblobcorewindowsnetltContainerNamegtltBlobNamegt Or in this case httpjjstoreblobcorewindowsnetschoolSummerSchoolAttendeestxt

            55

            Code - TableEntities using MicrosoftWindowsAzureStorageClient public class AttendeeEntity TableServiceEntity public string FirstName get set public string LastName get set public string Email get set public DateTime Birthday get set public string FavoriteIceCream get set public int YearsInPhD get set public bool Graduated get set hellip

            56

            Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

            57

            Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

            58

            Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

            59

            Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

            60

            Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

            Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

            61

            Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

            62

            Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

            Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

            63

            Tools ndash Fiddler2

            Best Practices

            Picking the Right VM Size

            bull Having the correct VM size can make a big difference in costs

            bull Fundamental choice ndash larger fewer VMs vs many smaller instances

            bull If you scale better than linear across cores larger VMs could save you money

            bull Pretty rare to see linear scaling across 8 cores

            bull More instances may provide better uptime and reliability (more failures needed to take your service down)

            bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

            Using Your VM to the Maximum

            Remember bull 1 role instance == 1 VM running Windows

            bull 1 role instance = one specific task for your code

            bull Yoursquore paying for the entire VM so why not use it

            bull Common mistake ndash split up code into multiple roles each not using up CPU

            bull Balance between using up CPU vs having free capacity in times of need

            bull Multiple ways to use your CPU to the fullest

            Exploiting Concurrency

            bull Spin up additional processes each with a specific task or as a unit of concurrency

            bull May not be ideal if number of active processes exceeds number of cores

            bull Use multithreading aggressively

            bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

            bull In NET 4 use the Task Parallel Library

            bull Data parallelism

            bull Task parallelism

            Finding Good Code Neighbors

            bull Typically code falls into one or more of these categories

            bull Find code that is intensive with different resources to live together

            bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

            Memory Intensive

            CPU Intensive

            Network IO Intensive

            Storage IO Intensive

            Scaling Appropriately

            bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

            bull Spinning VMs up and down automatically is good at large scale

            bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

            bull Being too aggressive in spinning down VMs can result in poor user experience

            bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

            Performance Cost

            Storage Costs

            bull Understand an applicationrsquos storage profile and how storage billing works

            bull Make service choices based on your app profile

            bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

            bull Service choice can make a big cost difference based on your app profile

            bull Caching and compressing They help a lot with storage costs

            Saving Bandwidth Costs

            Bandwidth costs are a huge part of any popular web apprsquos billing profile

            Sending fewer things over the wire often means getting fewer things from storage

            Saving bandwidth costs often lead to savings in other places

            Sending fewer things means your VM has time to do other tasks

            All of these tips have the side benefit of improving your web apprsquos performance and user experience

            Compressing Content

            1 Gzip all output content

            bull All modern browsers can decompress on the fly

            bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

            2Tradeoff compute costs for storage size

            3Minimize image sizes

            bull Use Portable Network Graphics (PNGs)

            bull Crush your PNGs

            bull Strip needless metadata

            bull Make all PNGs palette PNGs

            Uncompressed

            Content

            Compressed

            Content

            Gzip

            Minify JavaScript

            Minify CCS

            Minify Images

            Best Practices Summary

            Doing lsquolessrsquo is the key to saving costs

            Measure everything

            Know your application profile in and out

            Research Examples in the Cloud

            hellipon another set of slides

            Map Reduce on Azure

            bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

            bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

            76

            Project Daytona - Map Reduce on Azure

            httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

            77

            Questions and Discussionhellip

            Thank you for hosting me at the Summer School

            BLAST (Basic Local Alignment Search Tool)

            bull The most important software in bioinformatics

            bull Identify similarity between bio-sequences

            Computationally intensive

            bull Large number of pairwise alignment operations

            bull A BLAST running can take 700 ~ 1000 CPU hours

            bull Sequence databases growing exponentially

            bull GenBank doubled in size in about 15 months

            It is easy to parallelize BLAST

            bull Segment the input

            bull Segment processing (querying) is pleasingly parallel

            bull Segment the database (eg mpiBLAST)

            bull Needs special result reduction processing

            Large volume data

            bull A normal Blast database can be as large as 10GB

            bull 100 nodes means the peak storage bandwidth could reach to 1TB

            bull The output of BLAST is usually 10-100x larger than the input

            bull Parallel BLAST engine on Azure

            bull Query-segmentation data-parallel pattern bull split the input sequences

            bull query partitions in parallel

            bull merge results together when done

            bull Follows the general suggested application model bull Web Role + Queue + Worker

            bull With three special considerations bull Batch job management

            bull Task parallelism on an elastic Cloud

            Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

            on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

            A simple SplitJoin pattern

            Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

            bull 1248 for small middle large and extra large instance size

            Task granularity bull Large partition load imbalance

            bull Small partition unnecessary overheads bull NCBI-BLAST overhead

            bull Data transferring overhead

            Best Practice test runs to profiling and set size to mitigate the overhead

            Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

            bull too small repeated computation

            bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

            bull Estimate the value based on the number of pair-bases in the partition and test-runs

            bull Watch out for the 2-hour maximum limitation

            BLAST task

            Splitting task

            BLAST task

            BLAST task

            BLAST task

            hellip

            Merging Task

            Task size vs Performance

            bull Benefit of the warm cache effect

            bull 100 sequences per partition is the best choice

            Instance size vs Performance

            bull Super-linear speedup with larger size worker instances

            bull Primarily due to the memory capability

            Task SizeInstance Size vs Cost

            bull Extra-large instance generated the best and the most economical throughput

            bull Fully utilize the resource

            Web

            Portal

            Web

            Service

            Job registration

            Job Scheduler

            Worker

            Worker

            Worker

            Global

            dispatch

            queue

            Web Role

            Azure Table

            Job Management Role

            Azure Blob

            Database

            updating Role

            hellip

            Scaling Engine

            Blast databases

            temporary data etc)

            Job Registry NCBI databases

            BLAST task

            Splitting task

            BLAST task

            BLAST task

            BLAST task

            hellip

            Merging Task

            ASPNET program hosted by a web role instance bull Submit jobs

            bull Track jobrsquos status and logs

            AuthenticationAuthorization based on Live ID

            The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

            Web Portal

            Web Service

            Job registration

            Job Scheduler

            Job Portal

            Scaling Engine

            Job Registry

            R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

            Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

            bull Against ~5000 proteins from another strain completed in less than 30 sec

            AzureBLAST significantly saved computing timehellip

            Discovering Homologs bull Discover the interrelationships of known protein sequences

            ldquoAll against Allrdquo query bull The database is also the input query

            bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

            bull Theoretically 100 billion sequence comparisons

            Performance estimation bull Based on the sampling-running on one extra-large Azure instance

            bull Would require 3216731 minutes (61 years) on one desktop

            One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

            bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

            bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

            bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

            bull Each segment consists of smaller partitions

            bull When load imbalances redistribute the load manually

            5

            0

            62 6

            2 6

            2 6

            2 6

            2 5

            0 62

            bull Total size of the output result is ~230GB

            bull The number of total hits is 1764579487

            bull Started at March 25th the last task completed on April 8th (10 days compute)

            bull But based our estimates real working instance time should be 6~8 day

            bull Look into log data to analyze what took placehellip

            5

            0

            62 6

            2 6

            2 6

            2 6

            2 5

            0 62

            A normal log record should be

            Otherwise something is wrong (eg task failed to complete)

            3312010 614 RD00155D3611B0 Executing the task 251523

            3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

            3312010 625 RD00155D3611B0 Executing the task 251553

            3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

            3312010 644 RD00155D3611B0 Executing the task 251600

            3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

            3312010 822 RD00155D3611B0 Executing the task 251774

            3312010 950 RD00155D3611B0 Executing the task 251895

            3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

            North Europe Data Center totally 34256 tasks processed

            All 62 compute nodes lost tasks

            and then came back in a group

            This is an Update domain

            ~30 mins

            ~ 6 nodes in one group

            35 Nodes experience blob

            writing failure at same

            time

            West Europe Datacenter 30976 tasks are completed and job was killed

            A reasonable guess the

            Fault Domain is working

            MODISAzure Computing Evapotranspiration (ET) in The Cloud

            You never miss the water till the well has run dry Irish Proverb

            ET = Water volume evapotranspired (m3 s-1 m-2)

            Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

            λv = Latent heat of vaporization (Jg)

            Rn = Net radiation (W m-2)

            cp = Specific heat capacity of air (J kg-1 K-1)

            ρa = dry air density (kg m-3)

            δq = vapor pressure deficit (Pa)

            ga = Conductivity of air (inverse of ra) (m s-1)

            gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

            γ = Psychrometric constant (γ asymp 66 Pa K-1)

            Estimating resistanceconductivity across a

            catchment can be tricky

            bull Lots of inputs big data reduction

            bull Some of the inputs are not so simple

            119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

            (∆ + 120574 1 + 119892119886 119892119904 )120582120592

            Penman-Monteith (1964)

            Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

            NASA MODIS

            imagery source

            archives

            5 TB (600K files)

            FLUXNET curated

            sensor dataset

            (30GB 960 files)

            FLUXNET curated

            field dataset

            2 KB (1 file)

            NCEPNCAR

            ~100MB

            (4K files)

            Vegetative

            clumping

            ~5MB (1file)

            Climate

            classification

            ~1MB (1file)

            20 US year = 1 global year

            Data collection (map) stage

            bull Downloads requested input tiles

            from NASA ftp sites

            bull Includes geospatial lookup for

            non-sinusoidal tiles that will

            contribute to a reprojected

            sinusoidal tile

            Reprojection (map) stage

            bull Converts source tile(s) to

            intermediate result sinusoidal tiles

            bull Simple nearest neighbor or spline

            algorithms

            Derivation reduction stage

            bull First stage visible to scientist

            bull Computes ET in our initial use

            Analysis reduction stage

            bull Optional second stage visible to

            scientist

            bull Enables production of science

            analysis artifacts such as maps

            tables virtual sensors

            Reduction 1

            Queue

            Source

            Metadata

            AzureMODIS

            Service Web Role Portal

            Request

            Queue

            Scientific

            Results

            Download

            Data Collection Stage

            Source Imagery Download Sites

            Reprojection

            Queue

            Reduction 2

            Queue

            Download

            Queue

            Scientists

            Science results

            Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

            httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

            bull ModisAzure Service is the Web Role front door bull Receives all user requests

            bull Queues request to appropriate Download Reprojection or Reduction Job Queue

            bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

            recoverable units of work

            bull Execution status of all jobs and tasks persisted in Tables

            ltPipelineStagegt

            Request

            hellip ltPipelineStagegtJobStatus

            Persist ltPipelineStagegtJob Queue

            MODISAzure Service

            (Web Role)

            Service Monitor

            (Worker Role)

            Parse amp Persist ltPipelineStagegtTaskStatus

            hellip

            Dispatch

            ltPipelineStagegtTask Queue

            All work actually done by a Worker Role

            bull Sandboxes science or other executable

            bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

            Service Monitor

            (Worker Role)

            Parse amp Persist ltPipelineStagegtTaskStatus

            GenericWorker

            (Worker Role)

            hellip

            hellip

            Dispatch

            ltPipelineStagegtTask Queue

            hellip

            ltInputgtData Storage

            bull Dequeues tasks created by the Service Monitor

            bull Retries failed tasks 3 times

            bull Maintains all task status

            Reprojection Request

            hellip

            Service Monitor

            (Worker Role)

            ReprojectionJobStatus Persist

            Parse amp Persist ReprojectionTaskStatus

            GenericWorker

            (Worker Role)

            hellip

            Job Queue

            hellip

            Dispatch

            Task Queue

            Points to

            hellip

            ScanTimeList

            SwathGranuleMeta

            Reprojection Data

            Storage

            Each entity specifies a

            single reprojection job

            request

            Each entity specifies a

            single reprojection task (ie

            a single tile)

            Query this table to get

            geo-metadata (eg

            boundaries) for each swath

            tile

            Query this table to get the

            list of satellite scan times

            that cover a target tile

            Swath Source

            Data Storage

            bull Computational costs driven by data scale and need to run reduction multiple times

            bull Storage costs driven by data scale and 6 month project duration

            bull Small with respect to the people costs even at graduate student rates

            Reduction 1 Queue

            Source Metadata

            Request Queue

            Scientific Results Download

            Data Collection Stage

            Source Imagery Download Sites

            Reprojection Queue

            Reduction 2 Queue

            Download

            Queue

            Scientists

            Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

            400-500 GB

            60K files

            10 MBsec

            11 hours

            lt10 workers

            $50 upload

            $450 storage

            400 GB

            45K files

            3500 hours

            20-100

            workers

            5-7 GB

            55K files

            1800 hours

            20-100

            workers

            lt10 GB

            ~1K files

            1800 hours

            20-100

            workers

            $420 cpu

            $60 download

            $216 cpu

            $1 download

            $6 storage

            $216 cpu

            $2 download

            $9 storage

            AzureMODIS

            Service Web Role Portal

            Total $1420

            bull Clouds are the largest scale computer centers ever constructed and have the

            potential to be important to both large and small scale science problems

            bull Equally import they can increase participation in research providing needed

            resources to userscommunities without ready access

            bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

            support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

            latency applications do not perform optimally on clouds today

            bull Provide valuable fault tolerance and scalability abstractions

            bull Clouds as amplifier for familiar client tools and on premise compute

            bull Clouds services to support research provide considerable leverage for both

            individual researchers and entire communities of researchers

            • Day 2 - Azure as a PaaS
            • Day 2 - Applications

              Key Components Fabric Controller

              bull Think of it as an automated IT department

              bull ldquoCloud Layerrdquo on top of

              bull Windows Server 2008

              bull A custom version of Hyper-V called the Windows Azure Hypervisor

              bull Allows for automated management of virtual machines

              Key Components Fabric Controller

              bull Think of it as an automated IT department bull ldquoCloud Layerrdquo on top of

              bull Windows Server 2008

              bull A custom version of Hyper-V called the Windows Azure Hypervisor bull Allows for automated management of virtual machines

              bull Itrsquos job is to provision deploy monitor and maintain applications in data centers

              bull Applications have a ldquoshaperdquo and a ldquoconfigurationrdquo bull The configuration definition describes the shape of a service

              bull Role types

              bull Role VM sizes

              bull External and internal endpoints

              bull Local storage

              bull The configuration settings configures a service bull Instance count

              bull Storage keys

              bull Application-specific settings

              Key Components Fabric Controller

              bull Manages ldquonodesrdquo and ldquoedgesrdquo in the ldquofabricrdquo (the hardware) bull Power-on automation devices bull Routers Switches bull Hardware load balancers bull Physical servers bull Virtual servers

              bull State transitions bull Current State bull Goal State bull Does what is needed to reach and maintain the goal state

              bull Itrsquos a perfect IT employee bull Never sleeps bull Doesnrsquot ever ask for raise bull Always does what you tell it to do in configuration definition and settings

              Creating a New Project

              Windows Azure Compute

              Key Components ndash Compute Web Roles

              Web Front End

              bull Cloud web server

              bull Web pages

              bull Web services

              You can create the following types

              bull ASPNET web roles

              bull ASPNET MVC 2 web roles

              bull WCF service web roles

              bull Worker roles

              bull CGI-based web roles

              Key Components ndash Compute Worker Roles

              bull Utility compute

              bull Windows Server 2008

              bull Background processing

              bull Each role can define an amount of local storage

              bull Protected space on the local drive considered volatile storage

              bull May communicate with outside services

              bull Azure Storage

              bull SQL Azure

              bull Other Web services

              bull Can expose external and internal endpoints

              Suggested Application Model Using queues for reliable messaging

              Scalable Fault Tolerant Applications

              Queues are the application glue bull Decouple parts of application easier to scale independently bull Resource allocation different priority queues and backend servers bull Mask faults in worker roles (reliable messaging)

              Key Components ndash Compute VM Roles

              bull Customized Role

              bull You own the box

              bull How it works

              bull Download ldquoGuest OSrdquo to Server 2008 Hyper-V

              bull Customize the OS as you need to

              bull Upload the differences VHD

              bull Azure runs your VM role using

              bull Base OS

              bull Differences VHD

              Application Hosting

              lsquoGrokkingrsquo the service model

              bull Imagine white-boarding out your service architecture with boxes for nodes and arrows describing how they communicate

              bull The service model is the same diagram written down in a declarative format

              bull You give the Fabric the service model and the binaries that go with each of those nodes

              bull The Fabric can provision deploy and manage that diagram for you

              bull Find hardware home

              bull Copy and launch your app binaries

              bull Monitor your app and the hardware

              bull In case of failure take action Perhaps even relocate your app

              bull At all times the lsquodiagramrsquo stays whole

              Automated Service Management Provide code + service model

              bull Platform identifies and allocates resources deploys the service manages service health

              bull Configuration is handled by two files

              ServiceDefinitioncsdef

              ServiceConfigurationcscfg

              Service Definition

              Service Configuration

              GUI

              Double click on Role Name in Azure Project

              Deploying to the cloud

              bull We can deploy from the portal or from script

              bull VS builds two files

              bull Encrypted package of your code

              bull Your config file

              bull You must create an Azure account then a service and then you deploy your code

              bull Can take up to 20 minutes

              bull (which is better than six months)

              Service Management API

              bullREST based API to manage your services

              bullX509-certs for authentication

              bullLets you create delete change upgrade swaphellip

              bullLots of community and MSFT-built tools around the API

              - Easy to roll your own

              The Secret Sauce ndash The Fabric

              The Fabric is the lsquobrainrsquo behind Windows Azure

              1 Process service model

              1 Determine resource requirements

              2 Create role images

              2 Allocate resources

              3 Prepare nodes

              1 Place role images on nodes

              2 Configure settings

              3 Start roles

              4 Configure load balancers

              5 Maintain service health

              1 If role fails restart the role based on policy

              2 If node fails migrate the role based on policy

              Storage

              Durable Storage At Massive Scale

              Blob

              - Massive files eg videos logs

              Drive

              - Use standard file system APIs

              Tables - Non-relational but with few scale limits

              - Use SQL Azure for relational data

              Queues

              - Facilitate loosely-coupled reliable systems

              Blob Features and Functions

              bull Store Large Objects (up to 1TB in size)

              bull Can be served through Windows Azure CDN service

              bull Standard REST Interface

              bull PutBlob

              bull Inserts a new blob overwrites the existing blob

              bull GetBlob

              bull Get whole blob or a specific range

              bull DeleteBlob

              bull CopyBlob

              bull SnapshotBlob

              bull LeaseBlob

              Two Types of Blobs Under the Hood

              bull Block Blob

              bull Targeted at streaming workloads

              bull Each blob consists of a sequence of blocks bull Each block is identified by a Block ID

              bull Size limit 200GB per blob

              bull Page Blob

              bull Targeted at random readwrite workloads

              bull Each blob consists of an array of pages bull Each page is identified by its offset

              from the start of the blob

              bull Size limit 1TB per blob

              Windows Azure Drive

              bull Provides a durable NTFS volume for Windows Azure applications to use

              bull Use existing NTFS APIs to access a durable drive bull Durability and survival of data on application failover

              bull Enables migrating existing NTFS applications to the cloud

              bull A Windows Azure Drive is a Page Blob

              bull Example mount Page Blob as X bull httpltaccountnamegtblobcorewindowsnetltcontainernamegtltblobnamegt

              bull All writes to drive are made durable to the Page Blob bull Drive made durable through standard Page Blob replication

              bull Drive persists even when not mounted as a Page Blob

              Windows Azure Tables

              bull Provides Structured Storage

              bull Massively Scalable Tables bull Billions of entities (rows) and TBs of data

              bull Can use thousands of servers as traffic grows

              bull Highly Available amp Durable bull Data is replicated several times

              bull Familiar and Easy to use API

              bull WCF Data Services and OData bull NET classes and LINQ

              bull REST ndash with any platform or language

              Windows Azure Queues

              bull Queue are performance efficient highly available and provide reliable message delivery

              bull Simple asynchronous work dispatch

              bull Programming semantics ensure that a message can be processed at least once

              bull Access is provided via REST

              Storage Partitioning

              Understanding partitioning is key to understanding performance

              bull Different for each data type (blobs entities queues) Every data object has a partition key

              bull A partition can be served by a single server

              bull System load balances partitions based on traffic pattern

              bull Controls entity locality Partition key is unit of scale

              bull Load balancing can take a few minutes to kick in

              bull Can take a couple of seconds for partition to be available on a different server

              System load balances

              bull Use exponential backoff on ldquoServer Busyrdquo

              bull Our system load balances to meet your traffic needs

              bull Single partition limits have been reached Server Busy

              Partition Keys In Each Abstraction

              bull Entities w same PartitionKey value served from same partition Entities ndash TableName + PartitionKey

              PartitionKey (CustomerId) RowKey (RowKind)

              Name CreditCardNumber OrderTotal

              1 Customer-John Smith John Smith xxxx-xxxx-xxxx-xxxx

              1 Order ndash 1 $3512

              2 Customer-Bill Johnson Bill Johnson xxxx-xxxx-xxxx-xxxx

              2 Order ndash 3 $1000

              bull Every blob and its snapshots are in a single partition Blobs ndash Container name + Blob name

              bullAll messages for a single queue belong to the same partition Messages ndash Queue Name

              Container Name Blob Name

              image annarborbighousejpg

              image foxboroughgillettejpg

              video annarborbighousejpg

              Queue Message

              jobs Message1

              jobs Message2

              workflow Message1

              Scalability Targets

              Storage Account

              bull Capacity ndash Up to 100 TBs

              bull Transactions ndash Up to a few thousand requests per second

              bull Bandwidth ndash Up to a few hundred megabytes per second

              Single QueueTable Partition

              bull Up to 500 transactions per second

              To go above these numbers partition between multiple storage accounts and partitions

              When limit is hit app will see lsquo503 server busyrsquo applications should implement exponential backoff

              Single Blob Partition

              bull Throughput up to 60 MBs

              PartitionKey (Category)

              RowKey (Title)

              Timestamp ReleaseDate

              Action Fast amp Furious hellip 2009

              Action The Bourne Ultimatum hellip 2007

              hellip hellip hellip hellip

              Animation Open Season 2 hellip 2009

              Animation The Ant Bully hellip 2006

              PartitionKey (Category)

              RowKey (Title)

              Timestamp ReleaseDate

              Comedy Office Space hellip 1999

              hellip hellip hellip hellip

              SciFi X-Men Origins Wolverine hellip 2009

              hellip hellip hellip hellip

              War Defiance hellip 2008

              PartitionKey (Category)

              RowKey (Title)

              Timestamp ReleaseDate

              Action Fast amp Furious hellip 2009

              Action The Bourne Ultimatum hellip 2007

              hellip hellip hellip hellip

              Animation Open Season 2 hellip 2009

              Animation The Ant Bully hellip 2006

              hellip hellip hellip hellip

              Comedy Office Space hellip 1999

              hellip hellip hellip hellip

              SciFi X-Men Origins Wolverine hellip 2009

              hellip hellip hellip hellip

              War Defiance hellip 2008

              Partitions and Partition Ranges

              Key Selection Things to Consider

              bullDistribute load as much as possible bullHot partitions can be load balanced bullPartitionKey is critical for scalability

              See httpwwwmicrosoftpdccom2009SVC09 and httpazurescopecloudappnet for more information

              bull Avoid frequent large scans bull Parallelize queries bull Point queries are most efficient

              bullTransactions across a single partition bullTransaction semantics amp Reduce round trips

              Scalability

              Query Efficiency amp Speed

              Entity group transactions

              Expect Continuation Tokens ndash Seriously

              Maximum of 1000 rows in a response

              At the end of partition range boundary

              Maximum of 1000 rows in a response

              At the end of partition range boundary

              Maximum of 5 seconds to execute the query

              Tables Recap bullEfficient for frequently used queries

              bullSupports batch transactions

              bullDistributes load

              Select PartitionKey and RowKey that help scale

              Avoid ldquoAppend onlyrdquo patterns

              Always Handle continuation tokens

              ldquoORrdquo predicates are not optimized

              Implement back-off strategy for retries

              bullDistribute by using a hash etc as prefix

              bullExpect continuation tokens for range queries

              bullExecute the queries that form the ldquoORrdquo predicates as separate queries

              bullServer busy

              bullLoad balance partitions to meet traffic needs

              bullLoad on single partition has exceeded the limits

              WCF Data Services

              bullUse a new context for each logical operation

              bullAddObjectAttachTo can throw exception if entity is already being tracked

              bullPoint query throws an exception if resource does not exist Use IgnoreResourceNotFoundException

              Queues Their Unique Role in Building Reliable Scalable Applications

              bull Want roles that work closely together but are not bound together bull Tight coupling leads to brittleness

              bull This can aid in scaling and performance

              bull A queue can hold an unlimited number of messages bull Messages must be serializable as XML

              bull Limited to 8KB in size

              bull Commonly use the work ticket pattern

              bull Why not simply use a table

              Queue Terminology

              Message Lifecycle

              Queue

              Msg 1

              Msg 2

              Msg 3

              Msg 4

              Worker Role

              Worker Role

              PutMessage

              Web Role

              GetMessage (Timeout) RemoveMessage

              Msg 2 Msg 1

              Worker Role

              Msg 2

              POST httpmyaccountqueuecorewindowsnetmyqueuemessages

              HTTP11 200 OK Transfer-Encoding chunked Content-Type applicationxml Date Tue 09 Dec 2008 210430 GMT Server Nephos Queue Service Version 10 Microsoft-HTTPAPI20

              ltxml version=10 encoding=utf-8gt ltQueueMessagesListgt ltQueueMessagegt ltMessageIdgt5974b586-0df3-4e2d-ad0c-18e3892bfca2ltMessageIdgt ltInsertionTimegtMon 22 Sep 2008 232920 GMTltInsertionTimegt ltExpirationTimegtMon 29 Sep 2008 232920 GMTltExpirationTimegt ltPopReceiptgtYzQ4Yzg1MDIGM0MDFiZDAwYzEwltPopReceiptgt ltTimeNextVisiblegtTue 23 Sep 2008 052920GMTltTimeNextVisiblegt ltMessageTextgtPHRlc3Q+dGdGVzdD4=ltMessageTextgt ltQueueMessagegt ltQueueMessagesListgt

              DELETE httpmyaccountqueuecorewindowsnetmyqueuemessagesmessageidpopreceipt=YzQ4Yzg1MDIGM0MDFiZDAwYzEw

              Truncated Exponential Back Off Polling

              Consider a backoff polling approach Each empty poll

              increases interval by 2x

              A successful sets the interval back to 1

              44

              2 1

              1 1

              C1

              C2

              Removing Poison Messages

              1 1

              2 1

              3 4 0

              Producers Consumers

              P2

              P1

              3 0

              2 GetMessage(Q 30 s) msg 2

              1 GetMessage(Q 30 s) msg 1

              1 1

              2 1

              1 0

              2 0

              45

              C1

              C2

              Removing Poison Messages

              3 4 0

              Producers Consumers

              P2

              P1

              1 1

              2 1

              2 GetMessage(Q 30 s) msg 2 3 C2 consumed msg 2 4 DeleteMessage(Q msg 2) 7 GetMessage(Q 30 s) msg 1

              1 GetMessage(Q 30 s) msg 1 5 C1 crashed

              1 1

              2 1

              6 msg1 visible 30 s after Dequeue 3 0

              1 2

              1 1

              1 2

              46

              C1

              C2

              Removing Poison Messages

              3 4 0

              Producers Consumers

              P2

              P1

              1 2

              2 Dequeue(Q 30 sec) msg 2 3 C2 consumed msg 2 4 Delete(Q msg 2) 7 Dequeue(Q 30 sec) msg 1 8 C2 crashed

              1 Dequeue(Q 30 sec) msg 1 5 C1 crashed 10 C1 restarted 11 Dequeue(Q 30 sec) msg 1 12 DequeueCount gt 2 13 Delete (Q msg1) 1

              2

              6 msg1 visible 30s after Dequeue 9 msg1 visible 30s after Dequeue

              3 0

              1 3

              1 2

              1 3

              Queues Recap

              bullNo need to deal with failures Make message

              processing idempotent

              bullInvisible messages result in out of order Do not rely on order

              bullEnforce threshold on messagersquos dequeue count Use Dequeue count to remove

              poison messages

              bullMessages gt 8KB

              bullBatch messages

              bullGarbage collect orphaned blobs

              bullDynamically increasereduce workers

              Use blob to store message data with

              reference in message

              Use message count to scale

              bullNo need to deal with failures

              bullInvisible messages result in out of order

              bullEnforce threshold on messagersquos dequeue count

              bullDynamically increasereduce workers

              Windows Azure Storage Takeaways

              Blobs

              Drives

              Tables

              Queues

              httpblogsmsdncomwindowsazurestorage

              httpazurescopecloudappnet

              49

              A Quick Exercise

              hellipThen letrsquos look at some code and some tools

              50

              Code ndash AccountInformationcs public class AccountInformation private static string storageKey = ldquotHiSiSnOtMyKeY private static string accountName = jjstore private static StorageCredentialsAccountAndKey credentials internal static StorageCredentialsAccountAndKey Credentials get if (credentials == null) credentials = new StorageCredentialsAccountAndKey(accountName storageKey) return credentials

              51

              Code ndash BlobHelpercs public class BlobHelper private static string defaultContainerName = school private CloudBlobClient client = null private CloudBlobContainer container = null private void InitContainer() if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudBlobClient() container = clientGetContainerReference(defaultContainerName) containerCreateIfNotExist() BlobContainerPermissions permissions = containerGetPermissions() permissionsPublicAccess = BlobContainerPublicAccessTypeContainer containerSetPermissions(permissions)

              52

              Code ndash BlobHelpercs

              public void WriteFileToBlob(string filePath) if (client == null || container == null) InitContainer() FileInfo file = new FileInfo(filePath) CloudBlob blob = containerGetBlobReference(fileName) blobPropertiesContentType = GetContentType(fileExtension) blobUploadFile(fileFullName) Or if you want to write a string replace the last line with blobUploadText(someString) And make sure you set the content type to the appropriate MIME type (eg ldquotextplainrdquo)

              53

              Code ndash BlobHelpercs

              public string GetBlobText(string blobName) if (client == null || container == null) InitContainer() CloudBlob blob = containerGetBlobReference(blobName) try return blobDownloadText() catch (Exception) The blob probably does not exist or there is no connection available return null

              54

              Application Code - Blobs private void SaveToCloudButton_Click(object sender RoutedEventArgs e) StringBuilder buff = new StringBuilder() buffAppendLine(LastNameFirstNameEmailBirthdayNativeLanguageFavoriteIceCreamYearsInPhDGraduated) foreach (AttendeeEntity attendee in attendees) buffAppendLine(attendeeToCsvString()) blobHelperWriteStringToBlob(SummerSchoolAttendeestxt buffToString())

              The blob is now available at httpltAccountNamegtblobcorewindowsnetltContainerNamegtltBlobNamegt Or in this case httpjjstoreblobcorewindowsnetschoolSummerSchoolAttendeestxt

              55

              Code - TableEntities using MicrosoftWindowsAzureStorageClient public class AttendeeEntity TableServiceEntity public string FirstName get set public string LastName get set public string Email get set public DateTime Birthday get set public string FavoriteIceCream get set public int YearsInPhD get set public bool Graduated get set hellip

              56

              Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

              57

              Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

              58

              Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

              59

              Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

              60

              Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

              Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

              61

              Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

              62

              Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

              Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

              63

              Tools ndash Fiddler2

              Best Practices

              Picking the Right VM Size

              bull Having the correct VM size can make a big difference in costs

              bull Fundamental choice ndash larger fewer VMs vs many smaller instances

              bull If you scale better than linear across cores larger VMs could save you money

              bull Pretty rare to see linear scaling across 8 cores

              bull More instances may provide better uptime and reliability (more failures needed to take your service down)

              bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

              Using Your VM to the Maximum

              Remember bull 1 role instance == 1 VM running Windows

              bull 1 role instance = one specific task for your code

              bull Yoursquore paying for the entire VM so why not use it

              bull Common mistake ndash split up code into multiple roles each not using up CPU

              bull Balance between using up CPU vs having free capacity in times of need

              bull Multiple ways to use your CPU to the fullest

              Exploiting Concurrency

              bull Spin up additional processes each with a specific task or as a unit of concurrency

              bull May not be ideal if number of active processes exceeds number of cores

              bull Use multithreading aggressively

              bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

              bull In NET 4 use the Task Parallel Library

              bull Data parallelism

              bull Task parallelism

              Finding Good Code Neighbors

              bull Typically code falls into one or more of these categories

              bull Find code that is intensive with different resources to live together

              bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

              Memory Intensive

              CPU Intensive

              Network IO Intensive

              Storage IO Intensive

              Scaling Appropriately

              bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

              bull Spinning VMs up and down automatically is good at large scale

              bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

              bull Being too aggressive in spinning down VMs can result in poor user experience

              bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

              Performance Cost

              Storage Costs

              bull Understand an applicationrsquos storage profile and how storage billing works

              bull Make service choices based on your app profile

              bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

              bull Service choice can make a big cost difference based on your app profile

              bull Caching and compressing They help a lot with storage costs

              Saving Bandwidth Costs

              Bandwidth costs are a huge part of any popular web apprsquos billing profile

              Sending fewer things over the wire often means getting fewer things from storage

              Saving bandwidth costs often lead to savings in other places

              Sending fewer things means your VM has time to do other tasks

              All of these tips have the side benefit of improving your web apprsquos performance and user experience

              Compressing Content

              1 Gzip all output content

              bull All modern browsers can decompress on the fly

              bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

              2Tradeoff compute costs for storage size

              3Minimize image sizes

              bull Use Portable Network Graphics (PNGs)

              bull Crush your PNGs

              bull Strip needless metadata

              bull Make all PNGs palette PNGs

              Uncompressed

              Content

              Compressed

              Content

              Gzip

              Minify JavaScript

              Minify CCS

              Minify Images

              Best Practices Summary

              Doing lsquolessrsquo is the key to saving costs

              Measure everything

              Know your application profile in and out

              Research Examples in the Cloud

              hellipon another set of slides

              Map Reduce on Azure

              bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

              bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

              76

              Project Daytona - Map Reduce on Azure

              httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

              77

              Questions and Discussionhellip

              Thank you for hosting me at the Summer School

              BLAST (Basic Local Alignment Search Tool)

              bull The most important software in bioinformatics

              bull Identify similarity between bio-sequences

              Computationally intensive

              bull Large number of pairwise alignment operations

              bull A BLAST running can take 700 ~ 1000 CPU hours

              bull Sequence databases growing exponentially

              bull GenBank doubled in size in about 15 months

              It is easy to parallelize BLAST

              bull Segment the input

              bull Segment processing (querying) is pleasingly parallel

              bull Segment the database (eg mpiBLAST)

              bull Needs special result reduction processing

              Large volume data

              bull A normal Blast database can be as large as 10GB

              bull 100 nodes means the peak storage bandwidth could reach to 1TB

              bull The output of BLAST is usually 10-100x larger than the input

              bull Parallel BLAST engine on Azure

              bull Query-segmentation data-parallel pattern bull split the input sequences

              bull query partitions in parallel

              bull merge results together when done

              bull Follows the general suggested application model bull Web Role + Queue + Worker

              bull With three special considerations bull Batch job management

              bull Task parallelism on an elastic Cloud

              Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

              on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

              A simple SplitJoin pattern

              Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

              bull 1248 for small middle large and extra large instance size

              Task granularity bull Large partition load imbalance

              bull Small partition unnecessary overheads bull NCBI-BLAST overhead

              bull Data transferring overhead

              Best Practice test runs to profiling and set size to mitigate the overhead

              Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

              bull too small repeated computation

              bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

              bull Estimate the value based on the number of pair-bases in the partition and test-runs

              bull Watch out for the 2-hour maximum limitation

              BLAST task

              Splitting task

              BLAST task

              BLAST task

              BLAST task

              hellip

              Merging Task

              Task size vs Performance

              bull Benefit of the warm cache effect

              bull 100 sequences per partition is the best choice

              Instance size vs Performance

              bull Super-linear speedup with larger size worker instances

              bull Primarily due to the memory capability

              Task SizeInstance Size vs Cost

              bull Extra-large instance generated the best and the most economical throughput

              bull Fully utilize the resource

              Web

              Portal

              Web

              Service

              Job registration

              Job Scheduler

              Worker

              Worker

              Worker

              Global

              dispatch

              queue

              Web Role

              Azure Table

              Job Management Role

              Azure Blob

              Database

              updating Role

              hellip

              Scaling Engine

              Blast databases

              temporary data etc)

              Job Registry NCBI databases

              BLAST task

              Splitting task

              BLAST task

              BLAST task

              BLAST task

              hellip

              Merging Task

              ASPNET program hosted by a web role instance bull Submit jobs

              bull Track jobrsquos status and logs

              AuthenticationAuthorization based on Live ID

              The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

              Web Portal

              Web Service

              Job registration

              Job Scheduler

              Job Portal

              Scaling Engine

              Job Registry

              R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

              Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

              bull Against ~5000 proteins from another strain completed in less than 30 sec

              AzureBLAST significantly saved computing timehellip

              Discovering Homologs bull Discover the interrelationships of known protein sequences

              ldquoAll against Allrdquo query bull The database is also the input query

              bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

              bull Theoretically 100 billion sequence comparisons

              Performance estimation bull Based on the sampling-running on one extra-large Azure instance

              bull Would require 3216731 minutes (61 years) on one desktop

              One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

              bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

              bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

              bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

              bull Each segment consists of smaller partitions

              bull When load imbalances redistribute the load manually

              5

              0

              62 6

              2 6

              2 6

              2 6

              2 5

              0 62

              bull Total size of the output result is ~230GB

              bull The number of total hits is 1764579487

              bull Started at March 25th the last task completed on April 8th (10 days compute)

              bull But based our estimates real working instance time should be 6~8 day

              bull Look into log data to analyze what took placehellip

              5

              0

              62 6

              2 6

              2 6

              2 6

              2 5

              0 62

              A normal log record should be

              Otherwise something is wrong (eg task failed to complete)

              3312010 614 RD00155D3611B0 Executing the task 251523

              3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

              3312010 625 RD00155D3611B0 Executing the task 251553

              3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

              3312010 644 RD00155D3611B0 Executing the task 251600

              3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

              3312010 822 RD00155D3611B0 Executing the task 251774

              3312010 950 RD00155D3611B0 Executing the task 251895

              3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

              North Europe Data Center totally 34256 tasks processed

              All 62 compute nodes lost tasks

              and then came back in a group

              This is an Update domain

              ~30 mins

              ~ 6 nodes in one group

              35 Nodes experience blob

              writing failure at same

              time

              West Europe Datacenter 30976 tasks are completed and job was killed

              A reasonable guess the

              Fault Domain is working

              MODISAzure Computing Evapotranspiration (ET) in The Cloud

              You never miss the water till the well has run dry Irish Proverb

              ET = Water volume evapotranspired (m3 s-1 m-2)

              Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

              λv = Latent heat of vaporization (Jg)

              Rn = Net radiation (W m-2)

              cp = Specific heat capacity of air (J kg-1 K-1)

              ρa = dry air density (kg m-3)

              δq = vapor pressure deficit (Pa)

              ga = Conductivity of air (inverse of ra) (m s-1)

              gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

              γ = Psychrometric constant (γ asymp 66 Pa K-1)

              Estimating resistanceconductivity across a

              catchment can be tricky

              bull Lots of inputs big data reduction

              bull Some of the inputs are not so simple

              119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

              (∆ + 120574 1 + 119892119886 119892119904 )120582120592

              Penman-Monteith (1964)

              Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

              NASA MODIS

              imagery source

              archives

              5 TB (600K files)

              FLUXNET curated

              sensor dataset

              (30GB 960 files)

              FLUXNET curated

              field dataset

              2 KB (1 file)

              NCEPNCAR

              ~100MB

              (4K files)

              Vegetative

              clumping

              ~5MB (1file)

              Climate

              classification

              ~1MB (1file)

              20 US year = 1 global year

              Data collection (map) stage

              bull Downloads requested input tiles

              from NASA ftp sites

              bull Includes geospatial lookup for

              non-sinusoidal tiles that will

              contribute to a reprojected

              sinusoidal tile

              Reprojection (map) stage

              bull Converts source tile(s) to

              intermediate result sinusoidal tiles

              bull Simple nearest neighbor or spline

              algorithms

              Derivation reduction stage

              bull First stage visible to scientist

              bull Computes ET in our initial use

              Analysis reduction stage

              bull Optional second stage visible to

              scientist

              bull Enables production of science

              analysis artifacts such as maps

              tables virtual sensors

              Reduction 1

              Queue

              Source

              Metadata

              AzureMODIS

              Service Web Role Portal

              Request

              Queue

              Scientific

              Results

              Download

              Data Collection Stage

              Source Imagery Download Sites

              Reprojection

              Queue

              Reduction 2

              Queue

              Download

              Queue

              Scientists

              Science results

              Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

              httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

              bull ModisAzure Service is the Web Role front door bull Receives all user requests

              bull Queues request to appropriate Download Reprojection or Reduction Job Queue

              bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

              recoverable units of work

              bull Execution status of all jobs and tasks persisted in Tables

              ltPipelineStagegt

              Request

              hellip ltPipelineStagegtJobStatus

              Persist ltPipelineStagegtJob Queue

              MODISAzure Service

              (Web Role)

              Service Monitor

              (Worker Role)

              Parse amp Persist ltPipelineStagegtTaskStatus

              hellip

              Dispatch

              ltPipelineStagegtTask Queue

              All work actually done by a Worker Role

              bull Sandboxes science or other executable

              bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

              Service Monitor

              (Worker Role)

              Parse amp Persist ltPipelineStagegtTaskStatus

              GenericWorker

              (Worker Role)

              hellip

              hellip

              Dispatch

              ltPipelineStagegtTask Queue

              hellip

              ltInputgtData Storage

              bull Dequeues tasks created by the Service Monitor

              bull Retries failed tasks 3 times

              bull Maintains all task status

              Reprojection Request

              hellip

              Service Monitor

              (Worker Role)

              ReprojectionJobStatus Persist

              Parse amp Persist ReprojectionTaskStatus

              GenericWorker

              (Worker Role)

              hellip

              Job Queue

              hellip

              Dispatch

              Task Queue

              Points to

              hellip

              ScanTimeList

              SwathGranuleMeta

              Reprojection Data

              Storage

              Each entity specifies a

              single reprojection job

              request

              Each entity specifies a

              single reprojection task (ie

              a single tile)

              Query this table to get

              geo-metadata (eg

              boundaries) for each swath

              tile

              Query this table to get the

              list of satellite scan times

              that cover a target tile

              Swath Source

              Data Storage

              bull Computational costs driven by data scale and need to run reduction multiple times

              bull Storage costs driven by data scale and 6 month project duration

              bull Small with respect to the people costs even at graduate student rates

              Reduction 1 Queue

              Source Metadata

              Request Queue

              Scientific Results Download

              Data Collection Stage

              Source Imagery Download Sites

              Reprojection Queue

              Reduction 2 Queue

              Download

              Queue

              Scientists

              Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

              400-500 GB

              60K files

              10 MBsec

              11 hours

              lt10 workers

              $50 upload

              $450 storage

              400 GB

              45K files

              3500 hours

              20-100

              workers

              5-7 GB

              55K files

              1800 hours

              20-100

              workers

              lt10 GB

              ~1K files

              1800 hours

              20-100

              workers

              $420 cpu

              $60 download

              $216 cpu

              $1 download

              $6 storage

              $216 cpu

              $2 download

              $9 storage

              AzureMODIS

              Service Web Role Portal

              Total $1420

              bull Clouds are the largest scale computer centers ever constructed and have the

              potential to be important to both large and small scale science problems

              bull Equally import they can increase participation in research providing needed

              resources to userscommunities without ready access

              bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

              support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

              latency applications do not perform optimally on clouds today

              bull Provide valuable fault tolerance and scalability abstractions

              bull Clouds as amplifier for familiar client tools and on premise compute

              bull Clouds services to support research provide considerable leverage for both

              individual researchers and entire communities of researchers

              • Day 2 - Azure as a PaaS
              • Day 2 - Applications

                Key Components Fabric Controller

                bull Think of it as an automated IT department bull ldquoCloud Layerrdquo on top of

                bull Windows Server 2008

                bull A custom version of Hyper-V called the Windows Azure Hypervisor bull Allows for automated management of virtual machines

                bull Itrsquos job is to provision deploy monitor and maintain applications in data centers

                bull Applications have a ldquoshaperdquo and a ldquoconfigurationrdquo bull The configuration definition describes the shape of a service

                bull Role types

                bull Role VM sizes

                bull External and internal endpoints

                bull Local storage

                bull The configuration settings configures a service bull Instance count

                bull Storage keys

                bull Application-specific settings

                Key Components Fabric Controller

                bull Manages ldquonodesrdquo and ldquoedgesrdquo in the ldquofabricrdquo (the hardware) bull Power-on automation devices bull Routers Switches bull Hardware load balancers bull Physical servers bull Virtual servers

                bull State transitions bull Current State bull Goal State bull Does what is needed to reach and maintain the goal state

                bull Itrsquos a perfect IT employee bull Never sleeps bull Doesnrsquot ever ask for raise bull Always does what you tell it to do in configuration definition and settings

                Creating a New Project

                Windows Azure Compute

                Key Components ndash Compute Web Roles

                Web Front End

                bull Cloud web server

                bull Web pages

                bull Web services

                You can create the following types

                bull ASPNET web roles

                bull ASPNET MVC 2 web roles

                bull WCF service web roles

                bull Worker roles

                bull CGI-based web roles

                Key Components ndash Compute Worker Roles

                bull Utility compute

                bull Windows Server 2008

                bull Background processing

                bull Each role can define an amount of local storage

                bull Protected space on the local drive considered volatile storage

                bull May communicate with outside services

                bull Azure Storage

                bull SQL Azure

                bull Other Web services

                bull Can expose external and internal endpoints

                Suggested Application Model Using queues for reliable messaging

                Scalable Fault Tolerant Applications

                Queues are the application glue bull Decouple parts of application easier to scale independently bull Resource allocation different priority queues and backend servers bull Mask faults in worker roles (reliable messaging)

                Key Components ndash Compute VM Roles

                bull Customized Role

                bull You own the box

                bull How it works

                bull Download ldquoGuest OSrdquo to Server 2008 Hyper-V

                bull Customize the OS as you need to

                bull Upload the differences VHD

                bull Azure runs your VM role using

                bull Base OS

                bull Differences VHD

                Application Hosting

                lsquoGrokkingrsquo the service model

                bull Imagine white-boarding out your service architecture with boxes for nodes and arrows describing how they communicate

                bull The service model is the same diagram written down in a declarative format

                bull You give the Fabric the service model and the binaries that go with each of those nodes

                bull The Fabric can provision deploy and manage that diagram for you

                bull Find hardware home

                bull Copy and launch your app binaries

                bull Monitor your app and the hardware

                bull In case of failure take action Perhaps even relocate your app

                bull At all times the lsquodiagramrsquo stays whole

                Automated Service Management Provide code + service model

                bull Platform identifies and allocates resources deploys the service manages service health

                bull Configuration is handled by two files

                ServiceDefinitioncsdef

                ServiceConfigurationcscfg

                Service Definition

                Service Configuration

                GUI

                Double click on Role Name in Azure Project

                Deploying to the cloud

                bull We can deploy from the portal or from script

                bull VS builds two files

                bull Encrypted package of your code

                bull Your config file

                bull You must create an Azure account then a service and then you deploy your code

                bull Can take up to 20 minutes

                bull (which is better than six months)

                Service Management API

                bullREST based API to manage your services

                bullX509-certs for authentication

                bullLets you create delete change upgrade swaphellip

                bullLots of community and MSFT-built tools around the API

                - Easy to roll your own

                The Secret Sauce ndash The Fabric

                The Fabric is the lsquobrainrsquo behind Windows Azure

                1 Process service model

                1 Determine resource requirements

                2 Create role images

                2 Allocate resources

                3 Prepare nodes

                1 Place role images on nodes

                2 Configure settings

                3 Start roles

                4 Configure load balancers

                5 Maintain service health

                1 If role fails restart the role based on policy

                2 If node fails migrate the role based on policy

                Storage

                Durable Storage At Massive Scale

                Blob

                - Massive files eg videos logs

                Drive

                - Use standard file system APIs

                Tables - Non-relational but with few scale limits

                - Use SQL Azure for relational data

                Queues

                - Facilitate loosely-coupled reliable systems

                Blob Features and Functions

                bull Store Large Objects (up to 1TB in size)

                bull Can be served through Windows Azure CDN service

                bull Standard REST Interface

                bull PutBlob

                bull Inserts a new blob overwrites the existing blob

                bull GetBlob

                bull Get whole blob or a specific range

                bull DeleteBlob

                bull CopyBlob

                bull SnapshotBlob

                bull LeaseBlob

                Two Types of Blobs Under the Hood

                bull Block Blob

                bull Targeted at streaming workloads

                bull Each blob consists of a sequence of blocks bull Each block is identified by a Block ID

                bull Size limit 200GB per blob

                bull Page Blob

                bull Targeted at random readwrite workloads

                bull Each blob consists of an array of pages bull Each page is identified by its offset

                from the start of the blob

                bull Size limit 1TB per blob

                Windows Azure Drive

                bull Provides a durable NTFS volume for Windows Azure applications to use

                bull Use existing NTFS APIs to access a durable drive bull Durability and survival of data on application failover

                bull Enables migrating existing NTFS applications to the cloud

                bull A Windows Azure Drive is a Page Blob

                bull Example mount Page Blob as X bull httpltaccountnamegtblobcorewindowsnetltcontainernamegtltblobnamegt

                bull All writes to drive are made durable to the Page Blob bull Drive made durable through standard Page Blob replication

                bull Drive persists even when not mounted as a Page Blob

                Windows Azure Tables

                bull Provides Structured Storage

                bull Massively Scalable Tables bull Billions of entities (rows) and TBs of data

                bull Can use thousands of servers as traffic grows

                bull Highly Available amp Durable bull Data is replicated several times

                bull Familiar and Easy to use API

                bull WCF Data Services and OData bull NET classes and LINQ

                bull REST ndash with any platform or language

                Windows Azure Queues

                bull Queue are performance efficient highly available and provide reliable message delivery

                bull Simple asynchronous work dispatch

                bull Programming semantics ensure that a message can be processed at least once

                bull Access is provided via REST

                Storage Partitioning

                Understanding partitioning is key to understanding performance

                bull Different for each data type (blobs entities queues) Every data object has a partition key

                bull A partition can be served by a single server

                bull System load balances partitions based on traffic pattern

                bull Controls entity locality Partition key is unit of scale

                bull Load balancing can take a few minutes to kick in

                bull Can take a couple of seconds for partition to be available on a different server

                System load balances

                bull Use exponential backoff on ldquoServer Busyrdquo

                bull Our system load balances to meet your traffic needs

                bull Single partition limits have been reached Server Busy

                Partition Keys In Each Abstraction

                bull Entities w same PartitionKey value served from same partition Entities ndash TableName + PartitionKey

                PartitionKey (CustomerId) RowKey (RowKind)

                Name CreditCardNumber OrderTotal

                1 Customer-John Smith John Smith xxxx-xxxx-xxxx-xxxx

                1 Order ndash 1 $3512

                2 Customer-Bill Johnson Bill Johnson xxxx-xxxx-xxxx-xxxx

                2 Order ndash 3 $1000

                bull Every blob and its snapshots are in a single partition Blobs ndash Container name + Blob name

                bullAll messages for a single queue belong to the same partition Messages ndash Queue Name

                Container Name Blob Name

                image annarborbighousejpg

                image foxboroughgillettejpg

                video annarborbighousejpg

                Queue Message

                jobs Message1

                jobs Message2

                workflow Message1

                Scalability Targets

                Storage Account

                bull Capacity ndash Up to 100 TBs

                bull Transactions ndash Up to a few thousand requests per second

                bull Bandwidth ndash Up to a few hundred megabytes per second

                Single QueueTable Partition

                bull Up to 500 transactions per second

                To go above these numbers partition between multiple storage accounts and partitions

                When limit is hit app will see lsquo503 server busyrsquo applications should implement exponential backoff

                Single Blob Partition

                bull Throughput up to 60 MBs

                PartitionKey (Category)

                RowKey (Title)

                Timestamp ReleaseDate

                Action Fast amp Furious hellip 2009

                Action The Bourne Ultimatum hellip 2007

                hellip hellip hellip hellip

                Animation Open Season 2 hellip 2009

                Animation The Ant Bully hellip 2006

                PartitionKey (Category)

                RowKey (Title)

                Timestamp ReleaseDate

                Comedy Office Space hellip 1999

                hellip hellip hellip hellip

                SciFi X-Men Origins Wolverine hellip 2009

                hellip hellip hellip hellip

                War Defiance hellip 2008

                PartitionKey (Category)

                RowKey (Title)

                Timestamp ReleaseDate

                Action Fast amp Furious hellip 2009

                Action The Bourne Ultimatum hellip 2007

                hellip hellip hellip hellip

                Animation Open Season 2 hellip 2009

                Animation The Ant Bully hellip 2006

                hellip hellip hellip hellip

                Comedy Office Space hellip 1999

                hellip hellip hellip hellip

                SciFi X-Men Origins Wolverine hellip 2009

                hellip hellip hellip hellip

                War Defiance hellip 2008

                Partitions and Partition Ranges

                Key Selection Things to Consider

                bullDistribute load as much as possible bullHot partitions can be load balanced bullPartitionKey is critical for scalability

                See httpwwwmicrosoftpdccom2009SVC09 and httpazurescopecloudappnet for more information

                bull Avoid frequent large scans bull Parallelize queries bull Point queries are most efficient

                bullTransactions across a single partition bullTransaction semantics amp Reduce round trips

                Scalability

                Query Efficiency amp Speed

                Entity group transactions

                Expect Continuation Tokens ndash Seriously

                Maximum of 1000 rows in a response

                At the end of partition range boundary

                Maximum of 1000 rows in a response

                At the end of partition range boundary

                Maximum of 5 seconds to execute the query

                Tables Recap bullEfficient for frequently used queries

                bullSupports batch transactions

                bullDistributes load

                Select PartitionKey and RowKey that help scale

                Avoid ldquoAppend onlyrdquo patterns

                Always Handle continuation tokens

                ldquoORrdquo predicates are not optimized

                Implement back-off strategy for retries

                bullDistribute by using a hash etc as prefix

                bullExpect continuation tokens for range queries

                bullExecute the queries that form the ldquoORrdquo predicates as separate queries

                bullServer busy

                bullLoad balance partitions to meet traffic needs

                bullLoad on single partition has exceeded the limits

                WCF Data Services

                bullUse a new context for each logical operation

                bullAddObjectAttachTo can throw exception if entity is already being tracked

                bullPoint query throws an exception if resource does not exist Use IgnoreResourceNotFoundException

                Queues Their Unique Role in Building Reliable Scalable Applications

                bull Want roles that work closely together but are not bound together bull Tight coupling leads to brittleness

                bull This can aid in scaling and performance

                bull A queue can hold an unlimited number of messages bull Messages must be serializable as XML

                bull Limited to 8KB in size

                bull Commonly use the work ticket pattern

                bull Why not simply use a table

                Queue Terminology

                Message Lifecycle

                Queue

                Msg 1

                Msg 2

                Msg 3

                Msg 4

                Worker Role

                Worker Role

                PutMessage

                Web Role

                GetMessage (Timeout) RemoveMessage

                Msg 2 Msg 1

                Worker Role

                Msg 2

                POST httpmyaccountqueuecorewindowsnetmyqueuemessages

                HTTP11 200 OK Transfer-Encoding chunked Content-Type applicationxml Date Tue 09 Dec 2008 210430 GMT Server Nephos Queue Service Version 10 Microsoft-HTTPAPI20

                ltxml version=10 encoding=utf-8gt ltQueueMessagesListgt ltQueueMessagegt ltMessageIdgt5974b586-0df3-4e2d-ad0c-18e3892bfca2ltMessageIdgt ltInsertionTimegtMon 22 Sep 2008 232920 GMTltInsertionTimegt ltExpirationTimegtMon 29 Sep 2008 232920 GMTltExpirationTimegt ltPopReceiptgtYzQ4Yzg1MDIGM0MDFiZDAwYzEwltPopReceiptgt ltTimeNextVisiblegtTue 23 Sep 2008 052920GMTltTimeNextVisiblegt ltMessageTextgtPHRlc3Q+dGdGVzdD4=ltMessageTextgt ltQueueMessagegt ltQueueMessagesListgt

                DELETE httpmyaccountqueuecorewindowsnetmyqueuemessagesmessageidpopreceipt=YzQ4Yzg1MDIGM0MDFiZDAwYzEw

                Truncated Exponential Back Off Polling

                Consider a backoff polling approach Each empty poll

                increases interval by 2x

                A successful sets the interval back to 1

                44

                2 1

                1 1

                C1

                C2

                Removing Poison Messages

                1 1

                2 1

                3 4 0

                Producers Consumers

                P2

                P1

                3 0

                2 GetMessage(Q 30 s) msg 2

                1 GetMessage(Q 30 s) msg 1

                1 1

                2 1

                1 0

                2 0

                45

                C1

                C2

                Removing Poison Messages

                3 4 0

                Producers Consumers

                P2

                P1

                1 1

                2 1

                2 GetMessage(Q 30 s) msg 2 3 C2 consumed msg 2 4 DeleteMessage(Q msg 2) 7 GetMessage(Q 30 s) msg 1

                1 GetMessage(Q 30 s) msg 1 5 C1 crashed

                1 1

                2 1

                6 msg1 visible 30 s after Dequeue 3 0

                1 2

                1 1

                1 2

                46

                C1

                C2

                Removing Poison Messages

                3 4 0

                Producers Consumers

                P2

                P1

                1 2

                2 Dequeue(Q 30 sec) msg 2 3 C2 consumed msg 2 4 Delete(Q msg 2) 7 Dequeue(Q 30 sec) msg 1 8 C2 crashed

                1 Dequeue(Q 30 sec) msg 1 5 C1 crashed 10 C1 restarted 11 Dequeue(Q 30 sec) msg 1 12 DequeueCount gt 2 13 Delete (Q msg1) 1

                2

                6 msg1 visible 30s after Dequeue 9 msg1 visible 30s after Dequeue

                3 0

                1 3

                1 2

                1 3

                Queues Recap

                bullNo need to deal with failures Make message

                processing idempotent

                bullInvisible messages result in out of order Do not rely on order

                bullEnforce threshold on messagersquos dequeue count Use Dequeue count to remove

                poison messages

                bullMessages gt 8KB

                bullBatch messages

                bullGarbage collect orphaned blobs

                bullDynamically increasereduce workers

                Use blob to store message data with

                reference in message

                Use message count to scale

                bullNo need to deal with failures

                bullInvisible messages result in out of order

                bullEnforce threshold on messagersquos dequeue count

                bullDynamically increasereduce workers

                Windows Azure Storage Takeaways

                Blobs

                Drives

                Tables

                Queues

                httpblogsmsdncomwindowsazurestorage

                httpazurescopecloudappnet

                49

                A Quick Exercise

                hellipThen letrsquos look at some code and some tools

                50

                Code ndash AccountInformationcs public class AccountInformation private static string storageKey = ldquotHiSiSnOtMyKeY private static string accountName = jjstore private static StorageCredentialsAccountAndKey credentials internal static StorageCredentialsAccountAndKey Credentials get if (credentials == null) credentials = new StorageCredentialsAccountAndKey(accountName storageKey) return credentials

                51

                Code ndash BlobHelpercs public class BlobHelper private static string defaultContainerName = school private CloudBlobClient client = null private CloudBlobContainer container = null private void InitContainer() if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudBlobClient() container = clientGetContainerReference(defaultContainerName) containerCreateIfNotExist() BlobContainerPermissions permissions = containerGetPermissions() permissionsPublicAccess = BlobContainerPublicAccessTypeContainer containerSetPermissions(permissions)

                52

                Code ndash BlobHelpercs

                public void WriteFileToBlob(string filePath) if (client == null || container == null) InitContainer() FileInfo file = new FileInfo(filePath) CloudBlob blob = containerGetBlobReference(fileName) blobPropertiesContentType = GetContentType(fileExtension) blobUploadFile(fileFullName) Or if you want to write a string replace the last line with blobUploadText(someString) And make sure you set the content type to the appropriate MIME type (eg ldquotextplainrdquo)

                53

                Code ndash BlobHelpercs

                public string GetBlobText(string blobName) if (client == null || container == null) InitContainer() CloudBlob blob = containerGetBlobReference(blobName) try return blobDownloadText() catch (Exception) The blob probably does not exist or there is no connection available return null

                54

                Application Code - Blobs private void SaveToCloudButton_Click(object sender RoutedEventArgs e) StringBuilder buff = new StringBuilder() buffAppendLine(LastNameFirstNameEmailBirthdayNativeLanguageFavoriteIceCreamYearsInPhDGraduated) foreach (AttendeeEntity attendee in attendees) buffAppendLine(attendeeToCsvString()) blobHelperWriteStringToBlob(SummerSchoolAttendeestxt buffToString())

                The blob is now available at httpltAccountNamegtblobcorewindowsnetltContainerNamegtltBlobNamegt Or in this case httpjjstoreblobcorewindowsnetschoolSummerSchoolAttendeestxt

                55

                Code - TableEntities using MicrosoftWindowsAzureStorageClient public class AttendeeEntity TableServiceEntity public string FirstName get set public string LastName get set public string Email get set public DateTime Birthday get set public string FavoriteIceCream get set public int YearsInPhD get set public bool Graduated get set hellip

                56

                Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

                57

                Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

                58

                Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

                59

                Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

                60

                Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

                Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

                61

                Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

                62

                Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

                Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

                63

                Tools ndash Fiddler2

                Best Practices

                Picking the Right VM Size

                bull Having the correct VM size can make a big difference in costs

                bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                bull If you scale better than linear across cores larger VMs could save you money

                bull Pretty rare to see linear scaling across 8 cores

                bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                Using Your VM to the Maximum

                Remember bull 1 role instance == 1 VM running Windows

                bull 1 role instance = one specific task for your code

                bull Yoursquore paying for the entire VM so why not use it

                bull Common mistake ndash split up code into multiple roles each not using up CPU

                bull Balance between using up CPU vs having free capacity in times of need

                bull Multiple ways to use your CPU to the fullest

                Exploiting Concurrency

                bull Spin up additional processes each with a specific task or as a unit of concurrency

                bull May not be ideal if number of active processes exceeds number of cores

                bull Use multithreading aggressively

                bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                bull In NET 4 use the Task Parallel Library

                bull Data parallelism

                bull Task parallelism

                Finding Good Code Neighbors

                bull Typically code falls into one or more of these categories

                bull Find code that is intensive with different resources to live together

                bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                Memory Intensive

                CPU Intensive

                Network IO Intensive

                Storage IO Intensive

                Scaling Appropriately

                bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                bull Spinning VMs up and down automatically is good at large scale

                bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                bull Being too aggressive in spinning down VMs can result in poor user experience

                bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                Performance Cost

                Storage Costs

                bull Understand an applicationrsquos storage profile and how storage billing works

                bull Make service choices based on your app profile

                bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                bull Service choice can make a big cost difference based on your app profile

                bull Caching and compressing They help a lot with storage costs

                Saving Bandwidth Costs

                Bandwidth costs are a huge part of any popular web apprsquos billing profile

                Sending fewer things over the wire often means getting fewer things from storage

                Saving bandwidth costs often lead to savings in other places

                Sending fewer things means your VM has time to do other tasks

                All of these tips have the side benefit of improving your web apprsquos performance and user experience

                Compressing Content

                1 Gzip all output content

                bull All modern browsers can decompress on the fly

                bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                2Tradeoff compute costs for storage size

                3Minimize image sizes

                bull Use Portable Network Graphics (PNGs)

                bull Crush your PNGs

                bull Strip needless metadata

                bull Make all PNGs palette PNGs

                Uncompressed

                Content

                Compressed

                Content

                Gzip

                Minify JavaScript

                Minify CCS

                Minify Images

                Best Practices Summary

                Doing lsquolessrsquo is the key to saving costs

                Measure everything

                Know your application profile in and out

                Research Examples in the Cloud

                hellipon another set of slides

                Map Reduce on Azure

                bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                76

                Project Daytona - Map Reduce on Azure

                httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                77

                Questions and Discussionhellip

                Thank you for hosting me at the Summer School

                BLAST (Basic Local Alignment Search Tool)

                bull The most important software in bioinformatics

                bull Identify similarity between bio-sequences

                Computationally intensive

                bull Large number of pairwise alignment operations

                bull A BLAST running can take 700 ~ 1000 CPU hours

                bull Sequence databases growing exponentially

                bull GenBank doubled in size in about 15 months

                It is easy to parallelize BLAST

                bull Segment the input

                bull Segment processing (querying) is pleasingly parallel

                bull Segment the database (eg mpiBLAST)

                bull Needs special result reduction processing

                Large volume data

                bull A normal Blast database can be as large as 10GB

                bull 100 nodes means the peak storage bandwidth could reach to 1TB

                bull The output of BLAST is usually 10-100x larger than the input

                bull Parallel BLAST engine on Azure

                bull Query-segmentation data-parallel pattern bull split the input sequences

                bull query partitions in parallel

                bull merge results together when done

                bull Follows the general suggested application model bull Web Role + Queue + Worker

                bull With three special considerations bull Batch job management

                bull Task parallelism on an elastic Cloud

                Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                A simple SplitJoin pattern

                Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                bull 1248 for small middle large and extra large instance size

                Task granularity bull Large partition load imbalance

                bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                bull Data transferring overhead

                Best Practice test runs to profiling and set size to mitigate the overhead

                Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                bull too small repeated computation

                bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                bull Estimate the value based on the number of pair-bases in the partition and test-runs

                bull Watch out for the 2-hour maximum limitation

                BLAST task

                Splitting task

                BLAST task

                BLAST task

                BLAST task

                hellip

                Merging Task

                Task size vs Performance

                bull Benefit of the warm cache effect

                bull 100 sequences per partition is the best choice

                Instance size vs Performance

                bull Super-linear speedup with larger size worker instances

                bull Primarily due to the memory capability

                Task SizeInstance Size vs Cost

                bull Extra-large instance generated the best and the most economical throughput

                bull Fully utilize the resource

                Web

                Portal

                Web

                Service

                Job registration

                Job Scheduler

                Worker

                Worker

                Worker

                Global

                dispatch

                queue

                Web Role

                Azure Table

                Job Management Role

                Azure Blob

                Database

                updating Role

                hellip

                Scaling Engine

                Blast databases

                temporary data etc)

                Job Registry NCBI databases

                BLAST task

                Splitting task

                BLAST task

                BLAST task

                BLAST task

                hellip

                Merging Task

                ASPNET program hosted by a web role instance bull Submit jobs

                bull Track jobrsquos status and logs

                AuthenticationAuthorization based on Live ID

                The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                Web Portal

                Web Service

                Job registration

                Job Scheduler

                Job Portal

                Scaling Engine

                Job Registry

                R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                bull Against ~5000 proteins from another strain completed in less than 30 sec

                AzureBLAST significantly saved computing timehellip

                Discovering Homologs bull Discover the interrelationships of known protein sequences

                ldquoAll against Allrdquo query bull The database is also the input query

                bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                bull Theoretically 100 billion sequence comparisons

                Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                bull Would require 3216731 minutes (61 years) on one desktop

                One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                bull Each segment consists of smaller partitions

                bull When load imbalances redistribute the load manually

                5

                0

                62 6

                2 6

                2 6

                2 6

                2 5

                0 62

                bull Total size of the output result is ~230GB

                bull The number of total hits is 1764579487

                bull Started at March 25th the last task completed on April 8th (10 days compute)

                bull But based our estimates real working instance time should be 6~8 day

                bull Look into log data to analyze what took placehellip

                5

                0

                62 6

                2 6

                2 6

                2 6

                2 5

                0 62

                A normal log record should be

                Otherwise something is wrong (eg task failed to complete)

                3312010 614 RD00155D3611B0 Executing the task 251523

                3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                3312010 625 RD00155D3611B0 Executing the task 251553

                3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                3312010 644 RD00155D3611B0 Executing the task 251600

                3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                3312010 822 RD00155D3611B0 Executing the task 251774

                3312010 950 RD00155D3611B0 Executing the task 251895

                3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                North Europe Data Center totally 34256 tasks processed

                All 62 compute nodes lost tasks

                and then came back in a group

                This is an Update domain

                ~30 mins

                ~ 6 nodes in one group

                35 Nodes experience blob

                writing failure at same

                time

                West Europe Datacenter 30976 tasks are completed and job was killed

                A reasonable guess the

                Fault Domain is working

                MODISAzure Computing Evapotranspiration (ET) in The Cloud

                You never miss the water till the well has run dry Irish Proverb

                ET = Water volume evapotranspired (m3 s-1 m-2)

                Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                λv = Latent heat of vaporization (Jg)

                Rn = Net radiation (W m-2)

                cp = Specific heat capacity of air (J kg-1 K-1)

                ρa = dry air density (kg m-3)

                δq = vapor pressure deficit (Pa)

                ga = Conductivity of air (inverse of ra) (m s-1)

                gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                γ = Psychrometric constant (γ asymp 66 Pa K-1)

                Estimating resistanceconductivity across a

                catchment can be tricky

                bull Lots of inputs big data reduction

                bull Some of the inputs are not so simple

                119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                Penman-Monteith (1964)

                Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                NASA MODIS

                imagery source

                archives

                5 TB (600K files)

                FLUXNET curated

                sensor dataset

                (30GB 960 files)

                FLUXNET curated

                field dataset

                2 KB (1 file)

                NCEPNCAR

                ~100MB

                (4K files)

                Vegetative

                clumping

                ~5MB (1file)

                Climate

                classification

                ~1MB (1file)

                20 US year = 1 global year

                Data collection (map) stage

                bull Downloads requested input tiles

                from NASA ftp sites

                bull Includes geospatial lookup for

                non-sinusoidal tiles that will

                contribute to a reprojected

                sinusoidal tile

                Reprojection (map) stage

                bull Converts source tile(s) to

                intermediate result sinusoidal tiles

                bull Simple nearest neighbor or spline

                algorithms

                Derivation reduction stage

                bull First stage visible to scientist

                bull Computes ET in our initial use

                Analysis reduction stage

                bull Optional second stage visible to

                scientist

                bull Enables production of science

                analysis artifacts such as maps

                tables virtual sensors

                Reduction 1

                Queue

                Source

                Metadata

                AzureMODIS

                Service Web Role Portal

                Request

                Queue

                Scientific

                Results

                Download

                Data Collection Stage

                Source Imagery Download Sites

                Reprojection

                Queue

                Reduction 2

                Queue

                Download

                Queue

                Scientists

                Science results

                Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                bull ModisAzure Service is the Web Role front door bull Receives all user requests

                bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                recoverable units of work

                bull Execution status of all jobs and tasks persisted in Tables

                ltPipelineStagegt

                Request

                hellip ltPipelineStagegtJobStatus

                Persist ltPipelineStagegtJob Queue

                MODISAzure Service

                (Web Role)

                Service Monitor

                (Worker Role)

                Parse amp Persist ltPipelineStagegtTaskStatus

                hellip

                Dispatch

                ltPipelineStagegtTask Queue

                All work actually done by a Worker Role

                bull Sandboxes science or other executable

                bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                Service Monitor

                (Worker Role)

                Parse amp Persist ltPipelineStagegtTaskStatus

                GenericWorker

                (Worker Role)

                hellip

                hellip

                Dispatch

                ltPipelineStagegtTask Queue

                hellip

                ltInputgtData Storage

                bull Dequeues tasks created by the Service Monitor

                bull Retries failed tasks 3 times

                bull Maintains all task status

                Reprojection Request

                hellip

                Service Monitor

                (Worker Role)

                ReprojectionJobStatus Persist

                Parse amp Persist ReprojectionTaskStatus

                GenericWorker

                (Worker Role)

                hellip

                Job Queue

                hellip

                Dispatch

                Task Queue

                Points to

                hellip

                ScanTimeList

                SwathGranuleMeta

                Reprojection Data

                Storage

                Each entity specifies a

                single reprojection job

                request

                Each entity specifies a

                single reprojection task (ie

                a single tile)

                Query this table to get

                geo-metadata (eg

                boundaries) for each swath

                tile

                Query this table to get the

                list of satellite scan times

                that cover a target tile

                Swath Source

                Data Storage

                bull Computational costs driven by data scale and need to run reduction multiple times

                bull Storage costs driven by data scale and 6 month project duration

                bull Small with respect to the people costs even at graduate student rates

                Reduction 1 Queue

                Source Metadata

                Request Queue

                Scientific Results Download

                Data Collection Stage

                Source Imagery Download Sites

                Reprojection Queue

                Reduction 2 Queue

                Download

                Queue

                Scientists

                Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                400-500 GB

                60K files

                10 MBsec

                11 hours

                lt10 workers

                $50 upload

                $450 storage

                400 GB

                45K files

                3500 hours

                20-100

                workers

                5-7 GB

                55K files

                1800 hours

                20-100

                workers

                lt10 GB

                ~1K files

                1800 hours

                20-100

                workers

                $420 cpu

                $60 download

                $216 cpu

                $1 download

                $6 storage

                $216 cpu

                $2 download

                $9 storage

                AzureMODIS

                Service Web Role Portal

                Total $1420

                bull Clouds are the largest scale computer centers ever constructed and have the

                potential to be important to both large and small scale science problems

                bull Equally import they can increase participation in research providing needed

                resources to userscommunities without ready access

                bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                latency applications do not perform optimally on clouds today

                bull Provide valuable fault tolerance and scalability abstractions

                bull Clouds as amplifier for familiar client tools and on premise compute

                bull Clouds services to support research provide considerable leverage for both

                individual researchers and entire communities of researchers

                • Day 2 - Azure as a PaaS
                • Day 2 - Applications

                  Key Components Fabric Controller

                  bull Manages ldquonodesrdquo and ldquoedgesrdquo in the ldquofabricrdquo (the hardware) bull Power-on automation devices bull Routers Switches bull Hardware load balancers bull Physical servers bull Virtual servers

                  bull State transitions bull Current State bull Goal State bull Does what is needed to reach and maintain the goal state

                  bull Itrsquos a perfect IT employee bull Never sleeps bull Doesnrsquot ever ask for raise bull Always does what you tell it to do in configuration definition and settings

                  Creating a New Project

                  Windows Azure Compute

                  Key Components ndash Compute Web Roles

                  Web Front End

                  bull Cloud web server

                  bull Web pages

                  bull Web services

                  You can create the following types

                  bull ASPNET web roles

                  bull ASPNET MVC 2 web roles

                  bull WCF service web roles

                  bull Worker roles

                  bull CGI-based web roles

                  Key Components ndash Compute Worker Roles

                  bull Utility compute

                  bull Windows Server 2008

                  bull Background processing

                  bull Each role can define an amount of local storage

                  bull Protected space on the local drive considered volatile storage

                  bull May communicate with outside services

                  bull Azure Storage

                  bull SQL Azure

                  bull Other Web services

                  bull Can expose external and internal endpoints

                  Suggested Application Model Using queues for reliable messaging

                  Scalable Fault Tolerant Applications

                  Queues are the application glue bull Decouple parts of application easier to scale independently bull Resource allocation different priority queues and backend servers bull Mask faults in worker roles (reliable messaging)

                  Key Components ndash Compute VM Roles

                  bull Customized Role

                  bull You own the box

                  bull How it works

                  bull Download ldquoGuest OSrdquo to Server 2008 Hyper-V

                  bull Customize the OS as you need to

                  bull Upload the differences VHD

                  bull Azure runs your VM role using

                  bull Base OS

                  bull Differences VHD

                  Application Hosting

                  lsquoGrokkingrsquo the service model

                  bull Imagine white-boarding out your service architecture with boxes for nodes and arrows describing how they communicate

                  bull The service model is the same diagram written down in a declarative format

                  bull You give the Fabric the service model and the binaries that go with each of those nodes

                  bull The Fabric can provision deploy and manage that diagram for you

                  bull Find hardware home

                  bull Copy and launch your app binaries

                  bull Monitor your app and the hardware

                  bull In case of failure take action Perhaps even relocate your app

                  bull At all times the lsquodiagramrsquo stays whole

                  Automated Service Management Provide code + service model

                  bull Platform identifies and allocates resources deploys the service manages service health

                  bull Configuration is handled by two files

                  ServiceDefinitioncsdef

                  ServiceConfigurationcscfg

                  Service Definition

                  Service Configuration

                  GUI

                  Double click on Role Name in Azure Project

                  Deploying to the cloud

                  bull We can deploy from the portal or from script

                  bull VS builds two files

                  bull Encrypted package of your code

                  bull Your config file

                  bull You must create an Azure account then a service and then you deploy your code

                  bull Can take up to 20 minutes

                  bull (which is better than six months)

                  Service Management API

                  bullREST based API to manage your services

                  bullX509-certs for authentication

                  bullLets you create delete change upgrade swaphellip

                  bullLots of community and MSFT-built tools around the API

                  - Easy to roll your own

                  The Secret Sauce ndash The Fabric

                  The Fabric is the lsquobrainrsquo behind Windows Azure

                  1 Process service model

                  1 Determine resource requirements

                  2 Create role images

                  2 Allocate resources

                  3 Prepare nodes

                  1 Place role images on nodes

                  2 Configure settings

                  3 Start roles

                  4 Configure load balancers

                  5 Maintain service health

                  1 If role fails restart the role based on policy

                  2 If node fails migrate the role based on policy

                  Storage

                  Durable Storage At Massive Scale

                  Blob

                  - Massive files eg videos logs

                  Drive

                  - Use standard file system APIs

                  Tables - Non-relational but with few scale limits

                  - Use SQL Azure for relational data

                  Queues

                  - Facilitate loosely-coupled reliable systems

                  Blob Features and Functions

                  bull Store Large Objects (up to 1TB in size)

                  bull Can be served through Windows Azure CDN service

                  bull Standard REST Interface

                  bull PutBlob

                  bull Inserts a new blob overwrites the existing blob

                  bull GetBlob

                  bull Get whole blob or a specific range

                  bull DeleteBlob

                  bull CopyBlob

                  bull SnapshotBlob

                  bull LeaseBlob

                  Two Types of Blobs Under the Hood

                  bull Block Blob

                  bull Targeted at streaming workloads

                  bull Each blob consists of a sequence of blocks bull Each block is identified by a Block ID

                  bull Size limit 200GB per blob

                  bull Page Blob

                  bull Targeted at random readwrite workloads

                  bull Each blob consists of an array of pages bull Each page is identified by its offset

                  from the start of the blob

                  bull Size limit 1TB per blob

                  Windows Azure Drive

                  bull Provides a durable NTFS volume for Windows Azure applications to use

                  bull Use existing NTFS APIs to access a durable drive bull Durability and survival of data on application failover

                  bull Enables migrating existing NTFS applications to the cloud

                  bull A Windows Azure Drive is a Page Blob

                  bull Example mount Page Blob as X bull httpltaccountnamegtblobcorewindowsnetltcontainernamegtltblobnamegt

                  bull All writes to drive are made durable to the Page Blob bull Drive made durable through standard Page Blob replication

                  bull Drive persists even when not mounted as a Page Blob

                  Windows Azure Tables

                  bull Provides Structured Storage

                  bull Massively Scalable Tables bull Billions of entities (rows) and TBs of data

                  bull Can use thousands of servers as traffic grows

                  bull Highly Available amp Durable bull Data is replicated several times

                  bull Familiar and Easy to use API

                  bull WCF Data Services and OData bull NET classes and LINQ

                  bull REST ndash with any platform or language

                  Windows Azure Queues

                  bull Queue are performance efficient highly available and provide reliable message delivery

                  bull Simple asynchronous work dispatch

                  bull Programming semantics ensure that a message can be processed at least once

                  bull Access is provided via REST

                  Storage Partitioning

                  Understanding partitioning is key to understanding performance

                  bull Different for each data type (blobs entities queues) Every data object has a partition key

                  bull A partition can be served by a single server

                  bull System load balances partitions based on traffic pattern

                  bull Controls entity locality Partition key is unit of scale

                  bull Load balancing can take a few minutes to kick in

                  bull Can take a couple of seconds for partition to be available on a different server

                  System load balances

                  bull Use exponential backoff on ldquoServer Busyrdquo

                  bull Our system load balances to meet your traffic needs

                  bull Single partition limits have been reached Server Busy

                  Partition Keys In Each Abstraction

                  bull Entities w same PartitionKey value served from same partition Entities ndash TableName + PartitionKey

                  PartitionKey (CustomerId) RowKey (RowKind)

                  Name CreditCardNumber OrderTotal

                  1 Customer-John Smith John Smith xxxx-xxxx-xxxx-xxxx

                  1 Order ndash 1 $3512

                  2 Customer-Bill Johnson Bill Johnson xxxx-xxxx-xxxx-xxxx

                  2 Order ndash 3 $1000

                  bull Every blob and its snapshots are in a single partition Blobs ndash Container name + Blob name

                  bullAll messages for a single queue belong to the same partition Messages ndash Queue Name

                  Container Name Blob Name

                  image annarborbighousejpg

                  image foxboroughgillettejpg

                  video annarborbighousejpg

                  Queue Message

                  jobs Message1

                  jobs Message2

                  workflow Message1

                  Scalability Targets

                  Storage Account

                  bull Capacity ndash Up to 100 TBs

                  bull Transactions ndash Up to a few thousand requests per second

                  bull Bandwidth ndash Up to a few hundred megabytes per second

                  Single QueueTable Partition

                  bull Up to 500 transactions per second

                  To go above these numbers partition between multiple storage accounts and partitions

                  When limit is hit app will see lsquo503 server busyrsquo applications should implement exponential backoff

                  Single Blob Partition

                  bull Throughput up to 60 MBs

                  PartitionKey (Category)

                  RowKey (Title)

                  Timestamp ReleaseDate

                  Action Fast amp Furious hellip 2009

                  Action The Bourne Ultimatum hellip 2007

                  hellip hellip hellip hellip

                  Animation Open Season 2 hellip 2009

                  Animation The Ant Bully hellip 2006

                  PartitionKey (Category)

                  RowKey (Title)

                  Timestamp ReleaseDate

                  Comedy Office Space hellip 1999

                  hellip hellip hellip hellip

                  SciFi X-Men Origins Wolverine hellip 2009

                  hellip hellip hellip hellip

                  War Defiance hellip 2008

                  PartitionKey (Category)

                  RowKey (Title)

                  Timestamp ReleaseDate

                  Action Fast amp Furious hellip 2009

                  Action The Bourne Ultimatum hellip 2007

                  hellip hellip hellip hellip

                  Animation Open Season 2 hellip 2009

                  Animation The Ant Bully hellip 2006

                  hellip hellip hellip hellip

                  Comedy Office Space hellip 1999

                  hellip hellip hellip hellip

                  SciFi X-Men Origins Wolverine hellip 2009

                  hellip hellip hellip hellip

                  War Defiance hellip 2008

                  Partitions and Partition Ranges

                  Key Selection Things to Consider

                  bullDistribute load as much as possible bullHot partitions can be load balanced bullPartitionKey is critical for scalability

                  See httpwwwmicrosoftpdccom2009SVC09 and httpazurescopecloudappnet for more information

                  bull Avoid frequent large scans bull Parallelize queries bull Point queries are most efficient

                  bullTransactions across a single partition bullTransaction semantics amp Reduce round trips

                  Scalability

                  Query Efficiency amp Speed

                  Entity group transactions

                  Expect Continuation Tokens ndash Seriously

                  Maximum of 1000 rows in a response

                  At the end of partition range boundary

                  Maximum of 1000 rows in a response

                  At the end of partition range boundary

                  Maximum of 5 seconds to execute the query

                  Tables Recap bullEfficient for frequently used queries

                  bullSupports batch transactions

                  bullDistributes load

                  Select PartitionKey and RowKey that help scale

                  Avoid ldquoAppend onlyrdquo patterns

                  Always Handle continuation tokens

                  ldquoORrdquo predicates are not optimized

                  Implement back-off strategy for retries

                  bullDistribute by using a hash etc as prefix

                  bullExpect continuation tokens for range queries

                  bullExecute the queries that form the ldquoORrdquo predicates as separate queries

                  bullServer busy

                  bullLoad balance partitions to meet traffic needs

                  bullLoad on single partition has exceeded the limits

                  WCF Data Services

                  bullUse a new context for each logical operation

                  bullAddObjectAttachTo can throw exception if entity is already being tracked

                  bullPoint query throws an exception if resource does not exist Use IgnoreResourceNotFoundException

                  Queues Their Unique Role in Building Reliable Scalable Applications

                  bull Want roles that work closely together but are not bound together bull Tight coupling leads to brittleness

                  bull This can aid in scaling and performance

                  bull A queue can hold an unlimited number of messages bull Messages must be serializable as XML

                  bull Limited to 8KB in size

                  bull Commonly use the work ticket pattern

                  bull Why not simply use a table

                  Queue Terminology

                  Message Lifecycle

                  Queue

                  Msg 1

                  Msg 2

                  Msg 3

                  Msg 4

                  Worker Role

                  Worker Role

                  PutMessage

                  Web Role

                  GetMessage (Timeout) RemoveMessage

                  Msg 2 Msg 1

                  Worker Role

                  Msg 2

                  POST httpmyaccountqueuecorewindowsnetmyqueuemessages

                  HTTP11 200 OK Transfer-Encoding chunked Content-Type applicationxml Date Tue 09 Dec 2008 210430 GMT Server Nephos Queue Service Version 10 Microsoft-HTTPAPI20

                  ltxml version=10 encoding=utf-8gt ltQueueMessagesListgt ltQueueMessagegt ltMessageIdgt5974b586-0df3-4e2d-ad0c-18e3892bfca2ltMessageIdgt ltInsertionTimegtMon 22 Sep 2008 232920 GMTltInsertionTimegt ltExpirationTimegtMon 29 Sep 2008 232920 GMTltExpirationTimegt ltPopReceiptgtYzQ4Yzg1MDIGM0MDFiZDAwYzEwltPopReceiptgt ltTimeNextVisiblegtTue 23 Sep 2008 052920GMTltTimeNextVisiblegt ltMessageTextgtPHRlc3Q+dGdGVzdD4=ltMessageTextgt ltQueueMessagegt ltQueueMessagesListgt

                  DELETE httpmyaccountqueuecorewindowsnetmyqueuemessagesmessageidpopreceipt=YzQ4Yzg1MDIGM0MDFiZDAwYzEw

                  Truncated Exponential Back Off Polling

                  Consider a backoff polling approach Each empty poll

                  increases interval by 2x

                  A successful sets the interval back to 1

                  44

                  2 1

                  1 1

                  C1

                  C2

                  Removing Poison Messages

                  1 1

                  2 1

                  3 4 0

                  Producers Consumers

                  P2

                  P1

                  3 0

                  2 GetMessage(Q 30 s) msg 2

                  1 GetMessage(Q 30 s) msg 1

                  1 1

                  2 1

                  1 0

                  2 0

                  45

                  C1

                  C2

                  Removing Poison Messages

                  3 4 0

                  Producers Consumers

                  P2

                  P1

                  1 1

                  2 1

                  2 GetMessage(Q 30 s) msg 2 3 C2 consumed msg 2 4 DeleteMessage(Q msg 2) 7 GetMessage(Q 30 s) msg 1

                  1 GetMessage(Q 30 s) msg 1 5 C1 crashed

                  1 1

                  2 1

                  6 msg1 visible 30 s after Dequeue 3 0

                  1 2

                  1 1

                  1 2

                  46

                  C1

                  C2

                  Removing Poison Messages

                  3 4 0

                  Producers Consumers

                  P2

                  P1

                  1 2

                  2 Dequeue(Q 30 sec) msg 2 3 C2 consumed msg 2 4 Delete(Q msg 2) 7 Dequeue(Q 30 sec) msg 1 8 C2 crashed

                  1 Dequeue(Q 30 sec) msg 1 5 C1 crashed 10 C1 restarted 11 Dequeue(Q 30 sec) msg 1 12 DequeueCount gt 2 13 Delete (Q msg1) 1

                  2

                  6 msg1 visible 30s after Dequeue 9 msg1 visible 30s after Dequeue

                  3 0

                  1 3

                  1 2

                  1 3

                  Queues Recap

                  bullNo need to deal with failures Make message

                  processing idempotent

                  bullInvisible messages result in out of order Do not rely on order

                  bullEnforce threshold on messagersquos dequeue count Use Dequeue count to remove

                  poison messages

                  bullMessages gt 8KB

                  bullBatch messages

                  bullGarbage collect orphaned blobs

                  bullDynamically increasereduce workers

                  Use blob to store message data with

                  reference in message

                  Use message count to scale

                  bullNo need to deal with failures

                  bullInvisible messages result in out of order

                  bullEnforce threshold on messagersquos dequeue count

                  bullDynamically increasereduce workers

                  Windows Azure Storage Takeaways

                  Blobs

                  Drives

                  Tables

                  Queues

                  httpblogsmsdncomwindowsazurestorage

                  httpazurescopecloudappnet

                  49

                  A Quick Exercise

                  hellipThen letrsquos look at some code and some tools

                  50

                  Code ndash AccountInformationcs public class AccountInformation private static string storageKey = ldquotHiSiSnOtMyKeY private static string accountName = jjstore private static StorageCredentialsAccountAndKey credentials internal static StorageCredentialsAccountAndKey Credentials get if (credentials == null) credentials = new StorageCredentialsAccountAndKey(accountName storageKey) return credentials

                  51

                  Code ndash BlobHelpercs public class BlobHelper private static string defaultContainerName = school private CloudBlobClient client = null private CloudBlobContainer container = null private void InitContainer() if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudBlobClient() container = clientGetContainerReference(defaultContainerName) containerCreateIfNotExist() BlobContainerPermissions permissions = containerGetPermissions() permissionsPublicAccess = BlobContainerPublicAccessTypeContainer containerSetPermissions(permissions)

                  52

                  Code ndash BlobHelpercs

                  public void WriteFileToBlob(string filePath) if (client == null || container == null) InitContainer() FileInfo file = new FileInfo(filePath) CloudBlob blob = containerGetBlobReference(fileName) blobPropertiesContentType = GetContentType(fileExtension) blobUploadFile(fileFullName) Or if you want to write a string replace the last line with blobUploadText(someString) And make sure you set the content type to the appropriate MIME type (eg ldquotextplainrdquo)

                  53

                  Code ndash BlobHelpercs

                  public string GetBlobText(string blobName) if (client == null || container == null) InitContainer() CloudBlob blob = containerGetBlobReference(blobName) try return blobDownloadText() catch (Exception) The blob probably does not exist or there is no connection available return null

                  54

                  Application Code - Blobs private void SaveToCloudButton_Click(object sender RoutedEventArgs e) StringBuilder buff = new StringBuilder() buffAppendLine(LastNameFirstNameEmailBirthdayNativeLanguageFavoriteIceCreamYearsInPhDGraduated) foreach (AttendeeEntity attendee in attendees) buffAppendLine(attendeeToCsvString()) blobHelperWriteStringToBlob(SummerSchoolAttendeestxt buffToString())

                  The blob is now available at httpltAccountNamegtblobcorewindowsnetltContainerNamegtltBlobNamegt Or in this case httpjjstoreblobcorewindowsnetschoolSummerSchoolAttendeestxt

                  55

                  Code - TableEntities using MicrosoftWindowsAzureStorageClient public class AttendeeEntity TableServiceEntity public string FirstName get set public string LastName get set public string Email get set public DateTime Birthday get set public string FavoriteIceCream get set public int YearsInPhD get set public bool Graduated get set hellip

                  56

                  Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

                  57

                  Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

                  58

                  Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

                  59

                  Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

                  60

                  Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

                  Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

                  61

                  Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

                  62

                  Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

                  Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

                  63

                  Tools ndash Fiddler2

                  Best Practices

                  Picking the Right VM Size

                  bull Having the correct VM size can make a big difference in costs

                  bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                  bull If you scale better than linear across cores larger VMs could save you money

                  bull Pretty rare to see linear scaling across 8 cores

                  bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                  bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                  Using Your VM to the Maximum

                  Remember bull 1 role instance == 1 VM running Windows

                  bull 1 role instance = one specific task for your code

                  bull Yoursquore paying for the entire VM so why not use it

                  bull Common mistake ndash split up code into multiple roles each not using up CPU

                  bull Balance between using up CPU vs having free capacity in times of need

                  bull Multiple ways to use your CPU to the fullest

                  Exploiting Concurrency

                  bull Spin up additional processes each with a specific task or as a unit of concurrency

                  bull May not be ideal if number of active processes exceeds number of cores

                  bull Use multithreading aggressively

                  bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                  bull In NET 4 use the Task Parallel Library

                  bull Data parallelism

                  bull Task parallelism

                  Finding Good Code Neighbors

                  bull Typically code falls into one or more of these categories

                  bull Find code that is intensive with different resources to live together

                  bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                  Memory Intensive

                  CPU Intensive

                  Network IO Intensive

                  Storage IO Intensive

                  Scaling Appropriately

                  bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                  bull Spinning VMs up and down automatically is good at large scale

                  bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                  bull Being too aggressive in spinning down VMs can result in poor user experience

                  bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                  Performance Cost

                  Storage Costs

                  bull Understand an applicationrsquos storage profile and how storage billing works

                  bull Make service choices based on your app profile

                  bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                  bull Service choice can make a big cost difference based on your app profile

                  bull Caching and compressing They help a lot with storage costs

                  Saving Bandwidth Costs

                  Bandwidth costs are a huge part of any popular web apprsquos billing profile

                  Sending fewer things over the wire often means getting fewer things from storage

                  Saving bandwidth costs often lead to savings in other places

                  Sending fewer things means your VM has time to do other tasks

                  All of these tips have the side benefit of improving your web apprsquos performance and user experience

                  Compressing Content

                  1 Gzip all output content

                  bull All modern browsers can decompress on the fly

                  bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                  2Tradeoff compute costs for storage size

                  3Minimize image sizes

                  bull Use Portable Network Graphics (PNGs)

                  bull Crush your PNGs

                  bull Strip needless metadata

                  bull Make all PNGs palette PNGs

                  Uncompressed

                  Content

                  Compressed

                  Content

                  Gzip

                  Minify JavaScript

                  Minify CCS

                  Minify Images

                  Best Practices Summary

                  Doing lsquolessrsquo is the key to saving costs

                  Measure everything

                  Know your application profile in and out

                  Research Examples in the Cloud

                  hellipon another set of slides

                  Map Reduce on Azure

                  bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                  bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                  76

                  Project Daytona - Map Reduce on Azure

                  httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                  77

                  Questions and Discussionhellip

                  Thank you for hosting me at the Summer School

                  BLAST (Basic Local Alignment Search Tool)

                  bull The most important software in bioinformatics

                  bull Identify similarity between bio-sequences

                  Computationally intensive

                  bull Large number of pairwise alignment operations

                  bull A BLAST running can take 700 ~ 1000 CPU hours

                  bull Sequence databases growing exponentially

                  bull GenBank doubled in size in about 15 months

                  It is easy to parallelize BLAST

                  bull Segment the input

                  bull Segment processing (querying) is pleasingly parallel

                  bull Segment the database (eg mpiBLAST)

                  bull Needs special result reduction processing

                  Large volume data

                  bull A normal Blast database can be as large as 10GB

                  bull 100 nodes means the peak storage bandwidth could reach to 1TB

                  bull The output of BLAST is usually 10-100x larger than the input

                  bull Parallel BLAST engine on Azure

                  bull Query-segmentation data-parallel pattern bull split the input sequences

                  bull query partitions in parallel

                  bull merge results together when done

                  bull Follows the general suggested application model bull Web Role + Queue + Worker

                  bull With three special considerations bull Batch job management

                  bull Task parallelism on an elastic Cloud

                  Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                  on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                  A simple SplitJoin pattern

                  Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                  bull 1248 for small middle large and extra large instance size

                  Task granularity bull Large partition load imbalance

                  bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                  bull Data transferring overhead

                  Best Practice test runs to profiling and set size to mitigate the overhead

                  Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                  bull too small repeated computation

                  bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                  bull Estimate the value based on the number of pair-bases in the partition and test-runs

                  bull Watch out for the 2-hour maximum limitation

                  BLAST task

                  Splitting task

                  BLAST task

                  BLAST task

                  BLAST task

                  hellip

                  Merging Task

                  Task size vs Performance

                  bull Benefit of the warm cache effect

                  bull 100 sequences per partition is the best choice

                  Instance size vs Performance

                  bull Super-linear speedup with larger size worker instances

                  bull Primarily due to the memory capability

                  Task SizeInstance Size vs Cost

                  bull Extra-large instance generated the best and the most economical throughput

                  bull Fully utilize the resource

                  Web

                  Portal

                  Web

                  Service

                  Job registration

                  Job Scheduler

                  Worker

                  Worker

                  Worker

                  Global

                  dispatch

                  queue

                  Web Role

                  Azure Table

                  Job Management Role

                  Azure Blob

                  Database

                  updating Role

                  hellip

                  Scaling Engine

                  Blast databases

                  temporary data etc)

                  Job Registry NCBI databases

                  BLAST task

                  Splitting task

                  BLAST task

                  BLAST task

                  BLAST task

                  hellip

                  Merging Task

                  ASPNET program hosted by a web role instance bull Submit jobs

                  bull Track jobrsquos status and logs

                  AuthenticationAuthorization based on Live ID

                  The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                  Web Portal

                  Web Service

                  Job registration

                  Job Scheduler

                  Job Portal

                  Scaling Engine

                  Job Registry

                  R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                  Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                  bull Against ~5000 proteins from another strain completed in less than 30 sec

                  AzureBLAST significantly saved computing timehellip

                  Discovering Homologs bull Discover the interrelationships of known protein sequences

                  ldquoAll against Allrdquo query bull The database is also the input query

                  bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                  bull Theoretically 100 billion sequence comparisons

                  Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                  bull Would require 3216731 minutes (61 years) on one desktop

                  One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                  bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                  bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                  bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                  bull Each segment consists of smaller partitions

                  bull When load imbalances redistribute the load manually

                  5

                  0

                  62 6

                  2 6

                  2 6

                  2 6

                  2 5

                  0 62

                  bull Total size of the output result is ~230GB

                  bull The number of total hits is 1764579487

                  bull Started at March 25th the last task completed on April 8th (10 days compute)

                  bull But based our estimates real working instance time should be 6~8 day

                  bull Look into log data to analyze what took placehellip

                  5

                  0

                  62 6

                  2 6

                  2 6

                  2 6

                  2 5

                  0 62

                  A normal log record should be

                  Otherwise something is wrong (eg task failed to complete)

                  3312010 614 RD00155D3611B0 Executing the task 251523

                  3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                  3312010 625 RD00155D3611B0 Executing the task 251553

                  3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                  3312010 644 RD00155D3611B0 Executing the task 251600

                  3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                  3312010 822 RD00155D3611B0 Executing the task 251774

                  3312010 950 RD00155D3611B0 Executing the task 251895

                  3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                  North Europe Data Center totally 34256 tasks processed

                  All 62 compute nodes lost tasks

                  and then came back in a group

                  This is an Update domain

                  ~30 mins

                  ~ 6 nodes in one group

                  35 Nodes experience blob

                  writing failure at same

                  time

                  West Europe Datacenter 30976 tasks are completed and job was killed

                  A reasonable guess the

                  Fault Domain is working

                  MODISAzure Computing Evapotranspiration (ET) in The Cloud

                  You never miss the water till the well has run dry Irish Proverb

                  ET = Water volume evapotranspired (m3 s-1 m-2)

                  Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                  λv = Latent heat of vaporization (Jg)

                  Rn = Net radiation (W m-2)

                  cp = Specific heat capacity of air (J kg-1 K-1)

                  ρa = dry air density (kg m-3)

                  δq = vapor pressure deficit (Pa)

                  ga = Conductivity of air (inverse of ra) (m s-1)

                  gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                  γ = Psychrometric constant (γ asymp 66 Pa K-1)

                  Estimating resistanceconductivity across a

                  catchment can be tricky

                  bull Lots of inputs big data reduction

                  bull Some of the inputs are not so simple

                  119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                  (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                  Penman-Monteith (1964)

                  Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                  NASA MODIS

                  imagery source

                  archives

                  5 TB (600K files)

                  FLUXNET curated

                  sensor dataset

                  (30GB 960 files)

                  FLUXNET curated

                  field dataset

                  2 KB (1 file)

                  NCEPNCAR

                  ~100MB

                  (4K files)

                  Vegetative

                  clumping

                  ~5MB (1file)

                  Climate

                  classification

                  ~1MB (1file)

                  20 US year = 1 global year

                  Data collection (map) stage

                  bull Downloads requested input tiles

                  from NASA ftp sites

                  bull Includes geospatial lookup for

                  non-sinusoidal tiles that will

                  contribute to a reprojected

                  sinusoidal tile

                  Reprojection (map) stage

                  bull Converts source tile(s) to

                  intermediate result sinusoidal tiles

                  bull Simple nearest neighbor or spline

                  algorithms

                  Derivation reduction stage

                  bull First stage visible to scientist

                  bull Computes ET in our initial use

                  Analysis reduction stage

                  bull Optional second stage visible to

                  scientist

                  bull Enables production of science

                  analysis artifacts such as maps

                  tables virtual sensors

                  Reduction 1

                  Queue

                  Source

                  Metadata

                  AzureMODIS

                  Service Web Role Portal

                  Request

                  Queue

                  Scientific

                  Results

                  Download

                  Data Collection Stage

                  Source Imagery Download Sites

                  Reprojection

                  Queue

                  Reduction 2

                  Queue

                  Download

                  Queue

                  Scientists

                  Science results

                  Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                  httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                  bull ModisAzure Service is the Web Role front door bull Receives all user requests

                  bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                  bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                  recoverable units of work

                  bull Execution status of all jobs and tasks persisted in Tables

                  ltPipelineStagegt

                  Request

                  hellip ltPipelineStagegtJobStatus

                  Persist ltPipelineStagegtJob Queue

                  MODISAzure Service

                  (Web Role)

                  Service Monitor

                  (Worker Role)

                  Parse amp Persist ltPipelineStagegtTaskStatus

                  hellip

                  Dispatch

                  ltPipelineStagegtTask Queue

                  All work actually done by a Worker Role

                  bull Sandboxes science or other executable

                  bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                  Service Monitor

                  (Worker Role)

                  Parse amp Persist ltPipelineStagegtTaskStatus

                  GenericWorker

                  (Worker Role)

                  hellip

                  hellip

                  Dispatch

                  ltPipelineStagegtTask Queue

                  hellip

                  ltInputgtData Storage

                  bull Dequeues tasks created by the Service Monitor

                  bull Retries failed tasks 3 times

                  bull Maintains all task status

                  Reprojection Request

                  hellip

                  Service Monitor

                  (Worker Role)

                  ReprojectionJobStatus Persist

                  Parse amp Persist ReprojectionTaskStatus

                  GenericWorker

                  (Worker Role)

                  hellip

                  Job Queue

                  hellip

                  Dispatch

                  Task Queue

                  Points to

                  hellip

                  ScanTimeList

                  SwathGranuleMeta

                  Reprojection Data

                  Storage

                  Each entity specifies a

                  single reprojection job

                  request

                  Each entity specifies a

                  single reprojection task (ie

                  a single tile)

                  Query this table to get

                  geo-metadata (eg

                  boundaries) for each swath

                  tile

                  Query this table to get the

                  list of satellite scan times

                  that cover a target tile

                  Swath Source

                  Data Storage

                  bull Computational costs driven by data scale and need to run reduction multiple times

                  bull Storage costs driven by data scale and 6 month project duration

                  bull Small with respect to the people costs even at graduate student rates

                  Reduction 1 Queue

                  Source Metadata

                  Request Queue

                  Scientific Results Download

                  Data Collection Stage

                  Source Imagery Download Sites

                  Reprojection Queue

                  Reduction 2 Queue

                  Download

                  Queue

                  Scientists

                  Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                  400-500 GB

                  60K files

                  10 MBsec

                  11 hours

                  lt10 workers

                  $50 upload

                  $450 storage

                  400 GB

                  45K files

                  3500 hours

                  20-100

                  workers

                  5-7 GB

                  55K files

                  1800 hours

                  20-100

                  workers

                  lt10 GB

                  ~1K files

                  1800 hours

                  20-100

                  workers

                  $420 cpu

                  $60 download

                  $216 cpu

                  $1 download

                  $6 storage

                  $216 cpu

                  $2 download

                  $9 storage

                  AzureMODIS

                  Service Web Role Portal

                  Total $1420

                  bull Clouds are the largest scale computer centers ever constructed and have the

                  potential to be important to both large and small scale science problems

                  bull Equally import they can increase participation in research providing needed

                  resources to userscommunities without ready access

                  bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                  support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                  latency applications do not perform optimally on clouds today

                  bull Provide valuable fault tolerance and scalability abstractions

                  bull Clouds as amplifier for familiar client tools and on premise compute

                  bull Clouds services to support research provide considerable leverage for both

                  individual researchers and entire communities of researchers

                  • Day 2 - Azure as a PaaS
                  • Day 2 - Applications

                    Creating a New Project

                    Windows Azure Compute

                    Key Components ndash Compute Web Roles

                    Web Front End

                    bull Cloud web server

                    bull Web pages

                    bull Web services

                    You can create the following types

                    bull ASPNET web roles

                    bull ASPNET MVC 2 web roles

                    bull WCF service web roles

                    bull Worker roles

                    bull CGI-based web roles

                    Key Components ndash Compute Worker Roles

                    bull Utility compute

                    bull Windows Server 2008

                    bull Background processing

                    bull Each role can define an amount of local storage

                    bull Protected space on the local drive considered volatile storage

                    bull May communicate with outside services

                    bull Azure Storage

                    bull SQL Azure

                    bull Other Web services

                    bull Can expose external and internal endpoints

                    Suggested Application Model Using queues for reliable messaging

                    Scalable Fault Tolerant Applications

                    Queues are the application glue bull Decouple parts of application easier to scale independently bull Resource allocation different priority queues and backend servers bull Mask faults in worker roles (reliable messaging)

                    Key Components ndash Compute VM Roles

                    bull Customized Role

                    bull You own the box

                    bull How it works

                    bull Download ldquoGuest OSrdquo to Server 2008 Hyper-V

                    bull Customize the OS as you need to

                    bull Upload the differences VHD

                    bull Azure runs your VM role using

                    bull Base OS

                    bull Differences VHD

                    Application Hosting

                    lsquoGrokkingrsquo the service model

                    bull Imagine white-boarding out your service architecture with boxes for nodes and arrows describing how they communicate

                    bull The service model is the same diagram written down in a declarative format

                    bull You give the Fabric the service model and the binaries that go with each of those nodes

                    bull The Fabric can provision deploy and manage that diagram for you

                    bull Find hardware home

                    bull Copy and launch your app binaries

                    bull Monitor your app and the hardware

                    bull In case of failure take action Perhaps even relocate your app

                    bull At all times the lsquodiagramrsquo stays whole

                    Automated Service Management Provide code + service model

                    bull Platform identifies and allocates resources deploys the service manages service health

                    bull Configuration is handled by two files

                    ServiceDefinitioncsdef

                    ServiceConfigurationcscfg

                    Service Definition

                    Service Configuration

                    GUI

                    Double click on Role Name in Azure Project

                    Deploying to the cloud

                    bull We can deploy from the portal or from script

                    bull VS builds two files

                    bull Encrypted package of your code

                    bull Your config file

                    bull You must create an Azure account then a service and then you deploy your code

                    bull Can take up to 20 minutes

                    bull (which is better than six months)

                    Service Management API

                    bullREST based API to manage your services

                    bullX509-certs for authentication

                    bullLets you create delete change upgrade swaphellip

                    bullLots of community and MSFT-built tools around the API

                    - Easy to roll your own

                    The Secret Sauce ndash The Fabric

                    The Fabric is the lsquobrainrsquo behind Windows Azure

                    1 Process service model

                    1 Determine resource requirements

                    2 Create role images

                    2 Allocate resources

                    3 Prepare nodes

                    1 Place role images on nodes

                    2 Configure settings

                    3 Start roles

                    4 Configure load balancers

                    5 Maintain service health

                    1 If role fails restart the role based on policy

                    2 If node fails migrate the role based on policy

                    Storage

                    Durable Storage At Massive Scale

                    Blob

                    - Massive files eg videos logs

                    Drive

                    - Use standard file system APIs

                    Tables - Non-relational but with few scale limits

                    - Use SQL Azure for relational data

                    Queues

                    - Facilitate loosely-coupled reliable systems

                    Blob Features and Functions

                    bull Store Large Objects (up to 1TB in size)

                    bull Can be served through Windows Azure CDN service

                    bull Standard REST Interface

                    bull PutBlob

                    bull Inserts a new blob overwrites the existing blob

                    bull GetBlob

                    bull Get whole blob or a specific range

                    bull DeleteBlob

                    bull CopyBlob

                    bull SnapshotBlob

                    bull LeaseBlob

                    Two Types of Blobs Under the Hood

                    bull Block Blob

                    bull Targeted at streaming workloads

                    bull Each blob consists of a sequence of blocks bull Each block is identified by a Block ID

                    bull Size limit 200GB per blob

                    bull Page Blob

                    bull Targeted at random readwrite workloads

                    bull Each blob consists of an array of pages bull Each page is identified by its offset

                    from the start of the blob

                    bull Size limit 1TB per blob

                    Windows Azure Drive

                    bull Provides a durable NTFS volume for Windows Azure applications to use

                    bull Use existing NTFS APIs to access a durable drive bull Durability and survival of data on application failover

                    bull Enables migrating existing NTFS applications to the cloud

                    bull A Windows Azure Drive is a Page Blob

                    bull Example mount Page Blob as X bull httpltaccountnamegtblobcorewindowsnetltcontainernamegtltblobnamegt

                    bull All writes to drive are made durable to the Page Blob bull Drive made durable through standard Page Blob replication

                    bull Drive persists even when not mounted as a Page Blob

                    Windows Azure Tables

                    bull Provides Structured Storage

                    bull Massively Scalable Tables bull Billions of entities (rows) and TBs of data

                    bull Can use thousands of servers as traffic grows

                    bull Highly Available amp Durable bull Data is replicated several times

                    bull Familiar and Easy to use API

                    bull WCF Data Services and OData bull NET classes and LINQ

                    bull REST ndash with any platform or language

                    Windows Azure Queues

                    bull Queue are performance efficient highly available and provide reliable message delivery

                    bull Simple asynchronous work dispatch

                    bull Programming semantics ensure that a message can be processed at least once

                    bull Access is provided via REST

                    Storage Partitioning

                    Understanding partitioning is key to understanding performance

                    bull Different for each data type (blobs entities queues) Every data object has a partition key

                    bull A partition can be served by a single server

                    bull System load balances partitions based on traffic pattern

                    bull Controls entity locality Partition key is unit of scale

                    bull Load balancing can take a few minutes to kick in

                    bull Can take a couple of seconds for partition to be available on a different server

                    System load balances

                    bull Use exponential backoff on ldquoServer Busyrdquo

                    bull Our system load balances to meet your traffic needs

                    bull Single partition limits have been reached Server Busy

                    Partition Keys In Each Abstraction

                    bull Entities w same PartitionKey value served from same partition Entities ndash TableName + PartitionKey

                    PartitionKey (CustomerId) RowKey (RowKind)

                    Name CreditCardNumber OrderTotal

                    1 Customer-John Smith John Smith xxxx-xxxx-xxxx-xxxx

                    1 Order ndash 1 $3512

                    2 Customer-Bill Johnson Bill Johnson xxxx-xxxx-xxxx-xxxx

                    2 Order ndash 3 $1000

                    bull Every blob and its snapshots are in a single partition Blobs ndash Container name + Blob name

                    bullAll messages for a single queue belong to the same partition Messages ndash Queue Name

                    Container Name Blob Name

                    image annarborbighousejpg

                    image foxboroughgillettejpg

                    video annarborbighousejpg

                    Queue Message

                    jobs Message1

                    jobs Message2

                    workflow Message1

                    Scalability Targets

                    Storage Account

                    bull Capacity ndash Up to 100 TBs

                    bull Transactions ndash Up to a few thousand requests per second

                    bull Bandwidth ndash Up to a few hundred megabytes per second

                    Single QueueTable Partition

                    bull Up to 500 transactions per second

                    To go above these numbers partition between multiple storage accounts and partitions

                    When limit is hit app will see lsquo503 server busyrsquo applications should implement exponential backoff

                    Single Blob Partition

                    bull Throughput up to 60 MBs

                    PartitionKey (Category)

                    RowKey (Title)

                    Timestamp ReleaseDate

                    Action Fast amp Furious hellip 2009

                    Action The Bourne Ultimatum hellip 2007

                    hellip hellip hellip hellip

                    Animation Open Season 2 hellip 2009

                    Animation The Ant Bully hellip 2006

                    PartitionKey (Category)

                    RowKey (Title)

                    Timestamp ReleaseDate

                    Comedy Office Space hellip 1999

                    hellip hellip hellip hellip

                    SciFi X-Men Origins Wolverine hellip 2009

                    hellip hellip hellip hellip

                    War Defiance hellip 2008

                    PartitionKey (Category)

                    RowKey (Title)

                    Timestamp ReleaseDate

                    Action Fast amp Furious hellip 2009

                    Action The Bourne Ultimatum hellip 2007

                    hellip hellip hellip hellip

                    Animation Open Season 2 hellip 2009

                    Animation The Ant Bully hellip 2006

                    hellip hellip hellip hellip

                    Comedy Office Space hellip 1999

                    hellip hellip hellip hellip

                    SciFi X-Men Origins Wolverine hellip 2009

                    hellip hellip hellip hellip

                    War Defiance hellip 2008

                    Partitions and Partition Ranges

                    Key Selection Things to Consider

                    bullDistribute load as much as possible bullHot partitions can be load balanced bullPartitionKey is critical for scalability

                    See httpwwwmicrosoftpdccom2009SVC09 and httpazurescopecloudappnet for more information

                    bull Avoid frequent large scans bull Parallelize queries bull Point queries are most efficient

                    bullTransactions across a single partition bullTransaction semantics amp Reduce round trips

                    Scalability

                    Query Efficiency amp Speed

                    Entity group transactions

                    Expect Continuation Tokens ndash Seriously

                    Maximum of 1000 rows in a response

                    At the end of partition range boundary

                    Maximum of 1000 rows in a response

                    At the end of partition range boundary

                    Maximum of 5 seconds to execute the query

                    Tables Recap bullEfficient for frequently used queries

                    bullSupports batch transactions

                    bullDistributes load

                    Select PartitionKey and RowKey that help scale

                    Avoid ldquoAppend onlyrdquo patterns

                    Always Handle continuation tokens

                    ldquoORrdquo predicates are not optimized

                    Implement back-off strategy for retries

                    bullDistribute by using a hash etc as prefix

                    bullExpect continuation tokens for range queries

                    bullExecute the queries that form the ldquoORrdquo predicates as separate queries

                    bullServer busy

                    bullLoad balance partitions to meet traffic needs

                    bullLoad on single partition has exceeded the limits

                    WCF Data Services

                    bullUse a new context for each logical operation

                    bullAddObjectAttachTo can throw exception if entity is already being tracked

                    bullPoint query throws an exception if resource does not exist Use IgnoreResourceNotFoundException

                    Queues Their Unique Role in Building Reliable Scalable Applications

                    bull Want roles that work closely together but are not bound together bull Tight coupling leads to brittleness

                    bull This can aid in scaling and performance

                    bull A queue can hold an unlimited number of messages bull Messages must be serializable as XML

                    bull Limited to 8KB in size

                    bull Commonly use the work ticket pattern

                    bull Why not simply use a table

                    Queue Terminology

                    Message Lifecycle

                    Queue

                    Msg 1

                    Msg 2

                    Msg 3

                    Msg 4

                    Worker Role

                    Worker Role

                    PutMessage

                    Web Role

                    GetMessage (Timeout) RemoveMessage

                    Msg 2 Msg 1

                    Worker Role

                    Msg 2

                    POST httpmyaccountqueuecorewindowsnetmyqueuemessages

                    HTTP11 200 OK Transfer-Encoding chunked Content-Type applicationxml Date Tue 09 Dec 2008 210430 GMT Server Nephos Queue Service Version 10 Microsoft-HTTPAPI20

                    ltxml version=10 encoding=utf-8gt ltQueueMessagesListgt ltQueueMessagegt ltMessageIdgt5974b586-0df3-4e2d-ad0c-18e3892bfca2ltMessageIdgt ltInsertionTimegtMon 22 Sep 2008 232920 GMTltInsertionTimegt ltExpirationTimegtMon 29 Sep 2008 232920 GMTltExpirationTimegt ltPopReceiptgtYzQ4Yzg1MDIGM0MDFiZDAwYzEwltPopReceiptgt ltTimeNextVisiblegtTue 23 Sep 2008 052920GMTltTimeNextVisiblegt ltMessageTextgtPHRlc3Q+dGdGVzdD4=ltMessageTextgt ltQueueMessagegt ltQueueMessagesListgt

                    DELETE httpmyaccountqueuecorewindowsnetmyqueuemessagesmessageidpopreceipt=YzQ4Yzg1MDIGM0MDFiZDAwYzEw

                    Truncated Exponential Back Off Polling

                    Consider a backoff polling approach Each empty poll

                    increases interval by 2x

                    A successful sets the interval back to 1

                    44

                    2 1

                    1 1

                    C1

                    C2

                    Removing Poison Messages

                    1 1

                    2 1

                    3 4 0

                    Producers Consumers

                    P2

                    P1

                    3 0

                    2 GetMessage(Q 30 s) msg 2

                    1 GetMessage(Q 30 s) msg 1

                    1 1

                    2 1

                    1 0

                    2 0

                    45

                    C1

                    C2

                    Removing Poison Messages

                    3 4 0

                    Producers Consumers

                    P2

                    P1

                    1 1

                    2 1

                    2 GetMessage(Q 30 s) msg 2 3 C2 consumed msg 2 4 DeleteMessage(Q msg 2) 7 GetMessage(Q 30 s) msg 1

                    1 GetMessage(Q 30 s) msg 1 5 C1 crashed

                    1 1

                    2 1

                    6 msg1 visible 30 s after Dequeue 3 0

                    1 2

                    1 1

                    1 2

                    46

                    C1

                    C2

                    Removing Poison Messages

                    3 4 0

                    Producers Consumers

                    P2

                    P1

                    1 2

                    2 Dequeue(Q 30 sec) msg 2 3 C2 consumed msg 2 4 Delete(Q msg 2) 7 Dequeue(Q 30 sec) msg 1 8 C2 crashed

                    1 Dequeue(Q 30 sec) msg 1 5 C1 crashed 10 C1 restarted 11 Dequeue(Q 30 sec) msg 1 12 DequeueCount gt 2 13 Delete (Q msg1) 1

                    2

                    6 msg1 visible 30s after Dequeue 9 msg1 visible 30s after Dequeue

                    3 0

                    1 3

                    1 2

                    1 3

                    Queues Recap

                    bullNo need to deal with failures Make message

                    processing idempotent

                    bullInvisible messages result in out of order Do not rely on order

                    bullEnforce threshold on messagersquos dequeue count Use Dequeue count to remove

                    poison messages

                    bullMessages gt 8KB

                    bullBatch messages

                    bullGarbage collect orphaned blobs

                    bullDynamically increasereduce workers

                    Use blob to store message data with

                    reference in message

                    Use message count to scale

                    bullNo need to deal with failures

                    bullInvisible messages result in out of order

                    bullEnforce threshold on messagersquos dequeue count

                    bullDynamically increasereduce workers

                    Windows Azure Storage Takeaways

                    Blobs

                    Drives

                    Tables

                    Queues

                    httpblogsmsdncomwindowsazurestorage

                    httpazurescopecloudappnet

                    49

                    A Quick Exercise

                    hellipThen letrsquos look at some code and some tools

                    50

                    Code ndash AccountInformationcs public class AccountInformation private static string storageKey = ldquotHiSiSnOtMyKeY private static string accountName = jjstore private static StorageCredentialsAccountAndKey credentials internal static StorageCredentialsAccountAndKey Credentials get if (credentials == null) credentials = new StorageCredentialsAccountAndKey(accountName storageKey) return credentials

                    51

                    Code ndash BlobHelpercs public class BlobHelper private static string defaultContainerName = school private CloudBlobClient client = null private CloudBlobContainer container = null private void InitContainer() if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudBlobClient() container = clientGetContainerReference(defaultContainerName) containerCreateIfNotExist() BlobContainerPermissions permissions = containerGetPermissions() permissionsPublicAccess = BlobContainerPublicAccessTypeContainer containerSetPermissions(permissions)

                    52

                    Code ndash BlobHelpercs

                    public void WriteFileToBlob(string filePath) if (client == null || container == null) InitContainer() FileInfo file = new FileInfo(filePath) CloudBlob blob = containerGetBlobReference(fileName) blobPropertiesContentType = GetContentType(fileExtension) blobUploadFile(fileFullName) Or if you want to write a string replace the last line with blobUploadText(someString) And make sure you set the content type to the appropriate MIME type (eg ldquotextplainrdquo)

                    53

                    Code ndash BlobHelpercs

                    public string GetBlobText(string blobName) if (client == null || container == null) InitContainer() CloudBlob blob = containerGetBlobReference(blobName) try return blobDownloadText() catch (Exception) The blob probably does not exist or there is no connection available return null

                    54

                    Application Code - Blobs private void SaveToCloudButton_Click(object sender RoutedEventArgs e) StringBuilder buff = new StringBuilder() buffAppendLine(LastNameFirstNameEmailBirthdayNativeLanguageFavoriteIceCreamYearsInPhDGraduated) foreach (AttendeeEntity attendee in attendees) buffAppendLine(attendeeToCsvString()) blobHelperWriteStringToBlob(SummerSchoolAttendeestxt buffToString())

                    The blob is now available at httpltAccountNamegtblobcorewindowsnetltContainerNamegtltBlobNamegt Or in this case httpjjstoreblobcorewindowsnetschoolSummerSchoolAttendeestxt

                    55

                    Code - TableEntities using MicrosoftWindowsAzureStorageClient public class AttendeeEntity TableServiceEntity public string FirstName get set public string LastName get set public string Email get set public DateTime Birthday get set public string FavoriteIceCream get set public int YearsInPhD get set public bool Graduated get set hellip

                    56

                    Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

                    57

                    Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

                    58

                    Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

                    59

                    Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

                    60

                    Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

                    Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

                    61

                    Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

                    62

                    Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

                    Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

                    63

                    Tools ndash Fiddler2

                    Best Practices

                    Picking the Right VM Size

                    bull Having the correct VM size can make a big difference in costs

                    bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                    bull If you scale better than linear across cores larger VMs could save you money

                    bull Pretty rare to see linear scaling across 8 cores

                    bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                    bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                    Using Your VM to the Maximum

                    Remember bull 1 role instance == 1 VM running Windows

                    bull 1 role instance = one specific task for your code

                    bull Yoursquore paying for the entire VM so why not use it

                    bull Common mistake ndash split up code into multiple roles each not using up CPU

                    bull Balance between using up CPU vs having free capacity in times of need

                    bull Multiple ways to use your CPU to the fullest

                    Exploiting Concurrency

                    bull Spin up additional processes each with a specific task or as a unit of concurrency

                    bull May not be ideal if number of active processes exceeds number of cores

                    bull Use multithreading aggressively

                    bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                    bull In NET 4 use the Task Parallel Library

                    bull Data parallelism

                    bull Task parallelism

                    Finding Good Code Neighbors

                    bull Typically code falls into one or more of these categories

                    bull Find code that is intensive with different resources to live together

                    bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                    Memory Intensive

                    CPU Intensive

                    Network IO Intensive

                    Storage IO Intensive

                    Scaling Appropriately

                    bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                    bull Spinning VMs up and down automatically is good at large scale

                    bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                    bull Being too aggressive in spinning down VMs can result in poor user experience

                    bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                    Performance Cost

                    Storage Costs

                    bull Understand an applicationrsquos storage profile and how storage billing works

                    bull Make service choices based on your app profile

                    bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                    bull Service choice can make a big cost difference based on your app profile

                    bull Caching and compressing They help a lot with storage costs

                    Saving Bandwidth Costs

                    Bandwidth costs are a huge part of any popular web apprsquos billing profile

                    Sending fewer things over the wire often means getting fewer things from storage

                    Saving bandwidth costs often lead to savings in other places

                    Sending fewer things means your VM has time to do other tasks

                    All of these tips have the side benefit of improving your web apprsquos performance and user experience

                    Compressing Content

                    1 Gzip all output content

                    bull All modern browsers can decompress on the fly

                    bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                    2Tradeoff compute costs for storage size

                    3Minimize image sizes

                    bull Use Portable Network Graphics (PNGs)

                    bull Crush your PNGs

                    bull Strip needless metadata

                    bull Make all PNGs palette PNGs

                    Uncompressed

                    Content

                    Compressed

                    Content

                    Gzip

                    Minify JavaScript

                    Minify CCS

                    Minify Images

                    Best Practices Summary

                    Doing lsquolessrsquo is the key to saving costs

                    Measure everything

                    Know your application profile in and out

                    Research Examples in the Cloud

                    hellipon another set of slides

                    Map Reduce on Azure

                    bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                    bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                    76

                    Project Daytona - Map Reduce on Azure

                    httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                    77

                    Questions and Discussionhellip

                    Thank you for hosting me at the Summer School

                    BLAST (Basic Local Alignment Search Tool)

                    bull The most important software in bioinformatics

                    bull Identify similarity between bio-sequences

                    Computationally intensive

                    bull Large number of pairwise alignment operations

                    bull A BLAST running can take 700 ~ 1000 CPU hours

                    bull Sequence databases growing exponentially

                    bull GenBank doubled in size in about 15 months

                    It is easy to parallelize BLAST

                    bull Segment the input

                    bull Segment processing (querying) is pleasingly parallel

                    bull Segment the database (eg mpiBLAST)

                    bull Needs special result reduction processing

                    Large volume data

                    bull A normal Blast database can be as large as 10GB

                    bull 100 nodes means the peak storage bandwidth could reach to 1TB

                    bull The output of BLAST is usually 10-100x larger than the input

                    bull Parallel BLAST engine on Azure

                    bull Query-segmentation data-parallel pattern bull split the input sequences

                    bull query partitions in parallel

                    bull merge results together when done

                    bull Follows the general suggested application model bull Web Role + Queue + Worker

                    bull With three special considerations bull Batch job management

                    bull Task parallelism on an elastic Cloud

                    Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                    on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                    A simple SplitJoin pattern

                    Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                    bull 1248 for small middle large and extra large instance size

                    Task granularity bull Large partition load imbalance

                    bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                    bull Data transferring overhead

                    Best Practice test runs to profiling and set size to mitigate the overhead

                    Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                    bull too small repeated computation

                    bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                    bull Estimate the value based on the number of pair-bases in the partition and test-runs

                    bull Watch out for the 2-hour maximum limitation

                    BLAST task

                    Splitting task

                    BLAST task

                    BLAST task

                    BLAST task

                    hellip

                    Merging Task

                    Task size vs Performance

                    bull Benefit of the warm cache effect

                    bull 100 sequences per partition is the best choice

                    Instance size vs Performance

                    bull Super-linear speedup with larger size worker instances

                    bull Primarily due to the memory capability

                    Task SizeInstance Size vs Cost

                    bull Extra-large instance generated the best and the most economical throughput

                    bull Fully utilize the resource

                    Web

                    Portal

                    Web

                    Service

                    Job registration

                    Job Scheduler

                    Worker

                    Worker

                    Worker

                    Global

                    dispatch

                    queue

                    Web Role

                    Azure Table

                    Job Management Role

                    Azure Blob

                    Database

                    updating Role

                    hellip

                    Scaling Engine

                    Blast databases

                    temporary data etc)

                    Job Registry NCBI databases

                    BLAST task

                    Splitting task

                    BLAST task

                    BLAST task

                    BLAST task

                    hellip

                    Merging Task

                    ASPNET program hosted by a web role instance bull Submit jobs

                    bull Track jobrsquos status and logs

                    AuthenticationAuthorization based on Live ID

                    The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                    Web Portal

                    Web Service

                    Job registration

                    Job Scheduler

                    Job Portal

                    Scaling Engine

                    Job Registry

                    R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                    Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                    bull Against ~5000 proteins from another strain completed in less than 30 sec

                    AzureBLAST significantly saved computing timehellip

                    Discovering Homologs bull Discover the interrelationships of known protein sequences

                    ldquoAll against Allrdquo query bull The database is also the input query

                    bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                    bull Theoretically 100 billion sequence comparisons

                    Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                    bull Would require 3216731 minutes (61 years) on one desktop

                    One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                    bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                    bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                    bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                    bull Each segment consists of smaller partitions

                    bull When load imbalances redistribute the load manually

                    5

                    0

                    62 6

                    2 6

                    2 6

                    2 6

                    2 5

                    0 62

                    bull Total size of the output result is ~230GB

                    bull The number of total hits is 1764579487

                    bull Started at March 25th the last task completed on April 8th (10 days compute)

                    bull But based our estimates real working instance time should be 6~8 day

                    bull Look into log data to analyze what took placehellip

                    5

                    0

                    62 6

                    2 6

                    2 6

                    2 6

                    2 5

                    0 62

                    A normal log record should be

                    Otherwise something is wrong (eg task failed to complete)

                    3312010 614 RD00155D3611B0 Executing the task 251523

                    3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                    3312010 625 RD00155D3611B0 Executing the task 251553

                    3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                    3312010 644 RD00155D3611B0 Executing the task 251600

                    3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                    3312010 822 RD00155D3611B0 Executing the task 251774

                    3312010 950 RD00155D3611B0 Executing the task 251895

                    3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                    North Europe Data Center totally 34256 tasks processed

                    All 62 compute nodes lost tasks

                    and then came back in a group

                    This is an Update domain

                    ~30 mins

                    ~ 6 nodes in one group

                    35 Nodes experience blob

                    writing failure at same

                    time

                    West Europe Datacenter 30976 tasks are completed and job was killed

                    A reasonable guess the

                    Fault Domain is working

                    MODISAzure Computing Evapotranspiration (ET) in The Cloud

                    You never miss the water till the well has run dry Irish Proverb

                    ET = Water volume evapotranspired (m3 s-1 m-2)

                    Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                    λv = Latent heat of vaporization (Jg)

                    Rn = Net radiation (W m-2)

                    cp = Specific heat capacity of air (J kg-1 K-1)

                    ρa = dry air density (kg m-3)

                    δq = vapor pressure deficit (Pa)

                    ga = Conductivity of air (inverse of ra) (m s-1)

                    gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                    γ = Psychrometric constant (γ asymp 66 Pa K-1)

                    Estimating resistanceconductivity across a

                    catchment can be tricky

                    bull Lots of inputs big data reduction

                    bull Some of the inputs are not so simple

                    119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                    (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                    Penman-Monteith (1964)

                    Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                    NASA MODIS

                    imagery source

                    archives

                    5 TB (600K files)

                    FLUXNET curated

                    sensor dataset

                    (30GB 960 files)

                    FLUXNET curated

                    field dataset

                    2 KB (1 file)

                    NCEPNCAR

                    ~100MB

                    (4K files)

                    Vegetative

                    clumping

                    ~5MB (1file)

                    Climate

                    classification

                    ~1MB (1file)

                    20 US year = 1 global year

                    Data collection (map) stage

                    bull Downloads requested input tiles

                    from NASA ftp sites

                    bull Includes geospatial lookup for

                    non-sinusoidal tiles that will

                    contribute to a reprojected

                    sinusoidal tile

                    Reprojection (map) stage

                    bull Converts source tile(s) to

                    intermediate result sinusoidal tiles

                    bull Simple nearest neighbor or spline

                    algorithms

                    Derivation reduction stage

                    bull First stage visible to scientist

                    bull Computes ET in our initial use

                    Analysis reduction stage

                    bull Optional second stage visible to

                    scientist

                    bull Enables production of science

                    analysis artifacts such as maps

                    tables virtual sensors

                    Reduction 1

                    Queue

                    Source

                    Metadata

                    AzureMODIS

                    Service Web Role Portal

                    Request

                    Queue

                    Scientific

                    Results

                    Download

                    Data Collection Stage

                    Source Imagery Download Sites

                    Reprojection

                    Queue

                    Reduction 2

                    Queue

                    Download

                    Queue

                    Scientists

                    Science results

                    Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                    httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                    bull ModisAzure Service is the Web Role front door bull Receives all user requests

                    bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                    bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                    recoverable units of work

                    bull Execution status of all jobs and tasks persisted in Tables

                    ltPipelineStagegt

                    Request

                    hellip ltPipelineStagegtJobStatus

                    Persist ltPipelineStagegtJob Queue

                    MODISAzure Service

                    (Web Role)

                    Service Monitor

                    (Worker Role)

                    Parse amp Persist ltPipelineStagegtTaskStatus

                    hellip

                    Dispatch

                    ltPipelineStagegtTask Queue

                    All work actually done by a Worker Role

                    bull Sandboxes science or other executable

                    bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                    Service Monitor

                    (Worker Role)

                    Parse amp Persist ltPipelineStagegtTaskStatus

                    GenericWorker

                    (Worker Role)

                    hellip

                    hellip

                    Dispatch

                    ltPipelineStagegtTask Queue

                    hellip

                    ltInputgtData Storage

                    bull Dequeues tasks created by the Service Monitor

                    bull Retries failed tasks 3 times

                    bull Maintains all task status

                    Reprojection Request

                    hellip

                    Service Monitor

                    (Worker Role)

                    ReprojectionJobStatus Persist

                    Parse amp Persist ReprojectionTaskStatus

                    GenericWorker

                    (Worker Role)

                    hellip

                    Job Queue

                    hellip

                    Dispatch

                    Task Queue

                    Points to

                    hellip

                    ScanTimeList

                    SwathGranuleMeta

                    Reprojection Data

                    Storage

                    Each entity specifies a

                    single reprojection job

                    request

                    Each entity specifies a

                    single reprojection task (ie

                    a single tile)

                    Query this table to get

                    geo-metadata (eg

                    boundaries) for each swath

                    tile

                    Query this table to get the

                    list of satellite scan times

                    that cover a target tile

                    Swath Source

                    Data Storage

                    bull Computational costs driven by data scale and need to run reduction multiple times

                    bull Storage costs driven by data scale and 6 month project duration

                    bull Small with respect to the people costs even at graduate student rates

                    Reduction 1 Queue

                    Source Metadata

                    Request Queue

                    Scientific Results Download

                    Data Collection Stage

                    Source Imagery Download Sites

                    Reprojection Queue

                    Reduction 2 Queue

                    Download

                    Queue

                    Scientists

                    Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                    400-500 GB

                    60K files

                    10 MBsec

                    11 hours

                    lt10 workers

                    $50 upload

                    $450 storage

                    400 GB

                    45K files

                    3500 hours

                    20-100

                    workers

                    5-7 GB

                    55K files

                    1800 hours

                    20-100

                    workers

                    lt10 GB

                    ~1K files

                    1800 hours

                    20-100

                    workers

                    $420 cpu

                    $60 download

                    $216 cpu

                    $1 download

                    $6 storage

                    $216 cpu

                    $2 download

                    $9 storage

                    AzureMODIS

                    Service Web Role Portal

                    Total $1420

                    bull Clouds are the largest scale computer centers ever constructed and have the

                    potential to be important to both large and small scale science problems

                    bull Equally import they can increase participation in research providing needed

                    resources to userscommunities without ready access

                    bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                    support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                    latency applications do not perform optimally on clouds today

                    bull Provide valuable fault tolerance and scalability abstractions

                    bull Clouds as amplifier for familiar client tools and on premise compute

                    bull Clouds services to support research provide considerable leverage for both

                    individual researchers and entire communities of researchers

                    • Day 2 - Azure as a PaaS
                    • Day 2 - Applications

                      Windows Azure Compute

                      Key Components ndash Compute Web Roles

                      Web Front End

                      bull Cloud web server

                      bull Web pages

                      bull Web services

                      You can create the following types

                      bull ASPNET web roles

                      bull ASPNET MVC 2 web roles

                      bull WCF service web roles

                      bull Worker roles

                      bull CGI-based web roles

                      Key Components ndash Compute Worker Roles

                      bull Utility compute

                      bull Windows Server 2008

                      bull Background processing

                      bull Each role can define an amount of local storage

                      bull Protected space on the local drive considered volatile storage

                      bull May communicate with outside services

                      bull Azure Storage

                      bull SQL Azure

                      bull Other Web services

                      bull Can expose external and internal endpoints

                      Suggested Application Model Using queues for reliable messaging

                      Scalable Fault Tolerant Applications

                      Queues are the application glue bull Decouple parts of application easier to scale independently bull Resource allocation different priority queues and backend servers bull Mask faults in worker roles (reliable messaging)

                      Key Components ndash Compute VM Roles

                      bull Customized Role

                      bull You own the box

                      bull How it works

                      bull Download ldquoGuest OSrdquo to Server 2008 Hyper-V

                      bull Customize the OS as you need to

                      bull Upload the differences VHD

                      bull Azure runs your VM role using

                      bull Base OS

                      bull Differences VHD

                      Application Hosting

                      lsquoGrokkingrsquo the service model

                      bull Imagine white-boarding out your service architecture with boxes for nodes and arrows describing how they communicate

                      bull The service model is the same diagram written down in a declarative format

                      bull You give the Fabric the service model and the binaries that go with each of those nodes

                      bull The Fabric can provision deploy and manage that diagram for you

                      bull Find hardware home

                      bull Copy and launch your app binaries

                      bull Monitor your app and the hardware

                      bull In case of failure take action Perhaps even relocate your app

                      bull At all times the lsquodiagramrsquo stays whole

                      Automated Service Management Provide code + service model

                      bull Platform identifies and allocates resources deploys the service manages service health

                      bull Configuration is handled by two files

                      ServiceDefinitioncsdef

                      ServiceConfigurationcscfg

                      Service Definition

                      Service Configuration

                      GUI

                      Double click on Role Name in Azure Project

                      Deploying to the cloud

                      bull We can deploy from the portal or from script

                      bull VS builds two files

                      bull Encrypted package of your code

                      bull Your config file

                      bull You must create an Azure account then a service and then you deploy your code

                      bull Can take up to 20 minutes

                      bull (which is better than six months)

                      Service Management API

                      bullREST based API to manage your services

                      bullX509-certs for authentication

                      bullLets you create delete change upgrade swaphellip

                      bullLots of community and MSFT-built tools around the API

                      - Easy to roll your own

                      The Secret Sauce ndash The Fabric

                      The Fabric is the lsquobrainrsquo behind Windows Azure

                      1 Process service model

                      1 Determine resource requirements

                      2 Create role images

                      2 Allocate resources

                      3 Prepare nodes

                      1 Place role images on nodes

                      2 Configure settings

                      3 Start roles

                      4 Configure load balancers

                      5 Maintain service health

                      1 If role fails restart the role based on policy

                      2 If node fails migrate the role based on policy

                      Storage

                      Durable Storage At Massive Scale

                      Blob

                      - Massive files eg videos logs

                      Drive

                      - Use standard file system APIs

                      Tables - Non-relational but with few scale limits

                      - Use SQL Azure for relational data

                      Queues

                      - Facilitate loosely-coupled reliable systems

                      Blob Features and Functions

                      bull Store Large Objects (up to 1TB in size)

                      bull Can be served through Windows Azure CDN service

                      bull Standard REST Interface

                      bull PutBlob

                      bull Inserts a new blob overwrites the existing blob

                      bull GetBlob

                      bull Get whole blob or a specific range

                      bull DeleteBlob

                      bull CopyBlob

                      bull SnapshotBlob

                      bull LeaseBlob

                      Two Types of Blobs Under the Hood

                      bull Block Blob

                      bull Targeted at streaming workloads

                      bull Each blob consists of a sequence of blocks bull Each block is identified by a Block ID

                      bull Size limit 200GB per blob

                      bull Page Blob

                      bull Targeted at random readwrite workloads

                      bull Each blob consists of an array of pages bull Each page is identified by its offset

                      from the start of the blob

                      bull Size limit 1TB per blob

                      Windows Azure Drive

                      bull Provides a durable NTFS volume for Windows Azure applications to use

                      bull Use existing NTFS APIs to access a durable drive bull Durability and survival of data on application failover

                      bull Enables migrating existing NTFS applications to the cloud

                      bull A Windows Azure Drive is a Page Blob

                      bull Example mount Page Blob as X bull httpltaccountnamegtblobcorewindowsnetltcontainernamegtltblobnamegt

                      bull All writes to drive are made durable to the Page Blob bull Drive made durable through standard Page Blob replication

                      bull Drive persists even when not mounted as a Page Blob

                      Windows Azure Tables

                      bull Provides Structured Storage

                      bull Massively Scalable Tables bull Billions of entities (rows) and TBs of data

                      bull Can use thousands of servers as traffic grows

                      bull Highly Available amp Durable bull Data is replicated several times

                      bull Familiar and Easy to use API

                      bull WCF Data Services and OData bull NET classes and LINQ

                      bull REST ndash with any platform or language

                      Windows Azure Queues

                      bull Queue are performance efficient highly available and provide reliable message delivery

                      bull Simple asynchronous work dispatch

                      bull Programming semantics ensure that a message can be processed at least once

                      bull Access is provided via REST

                      Storage Partitioning

                      Understanding partitioning is key to understanding performance

                      bull Different for each data type (blobs entities queues) Every data object has a partition key

                      bull A partition can be served by a single server

                      bull System load balances partitions based on traffic pattern

                      bull Controls entity locality Partition key is unit of scale

                      bull Load balancing can take a few minutes to kick in

                      bull Can take a couple of seconds for partition to be available on a different server

                      System load balances

                      bull Use exponential backoff on ldquoServer Busyrdquo

                      bull Our system load balances to meet your traffic needs

                      bull Single partition limits have been reached Server Busy

                      Partition Keys In Each Abstraction

                      bull Entities w same PartitionKey value served from same partition Entities ndash TableName + PartitionKey

                      PartitionKey (CustomerId) RowKey (RowKind)

                      Name CreditCardNumber OrderTotal

                      1 Customer-John Smith John Smith xxxx-xxxx-xxxx-xxxx

                      1 Order ndash 1 $3512

                      2 Customer-Bill Johnson Bill Johnson xxxx-xxxx-xxxx-xxxx

                      2 Order ndash 3 $1000

                      bull Every blob and its snapshots are in a single partition Blobs ndash Container name + Blob name

                      bullAll messages for a single queue belong to the same partition Messages ndash Queue Name

                      Container Name Blob Name

                      image annarborbighousejpg

                      image foxboroughgillettejpg

                      video annarborbighousejpg

                      Queue Message

                      jobs Message1

                      jobs Message2

                      workflow Message1

                      Scalability Targets

                      Storage Account

                      bull Capacity ndash Up to 100 TBs

                      bull Transactions ndash Up to a few thousand requests per second

                      bull Bandwidth ndash Up to a few hundred megabytes per second

                      Single QueueTable Partition

                      bull Up to 500 transactions per second

                      To go above these numbers partition between multiple storage accounts and partitions

                      When limit is hit app will see lsquo503 server busyrsquo applications should implement exponential backoff

                      Single Blob Partition

                      bull Throughput up to 60 MBs

                      PartitionKey (Category)

                      RowKey (Title)

                      Timestamp ReleaseDate

                      Action Fast amp Furious hellip 2009

                      Action The Bourne Ultimatum hellip 2007

                      hellip hellip hellip hellip

                      Animation Open Season 2 hellip 2009

                      Animation The Ant Bully hellip 2006

                      PartitionKey (Category)

                      RowKey (Title)

                      Timestamp ReleaseDate

                      Comedy Office Space hellip 1999

                      hellip hellip hellip hellip

                      SciFi X-Men Origins Wolverine hellip 2009

                      hellip hellip hellip hellip

                      War Defiance hellip 2008

                      PartitionKey (Category)

                      RowKey (Title)

                      Timestamp ReleaseDate

                      Action Fast amp Furious hellip 2009

                      Action The Bourne Ultimatum hellip 2007

                      hellip hellip hellip hellip

                      Animation Open Season 2 hellip 2009

                      Animation The Ant Bully hellip 2006

                      hellip hellip hellip hellip

                      Comedy Office Space hellip 1999

                      hellip hellip hellip hellip

                      SciFi X-Men Origins Wolverine hellip 2009

                      hellip hellip hellip hellip

                      War Defiance hellip 2008

                      Partitions and Partition Ranges

                      Key Selection Things to Consider

                      bullDistribute load as much as possible bullHot partitions can be load balanced bullPartitionKey is critical for scalability

                      See httpwwwmicrosoftpdccom2009SVC09 and httpazurescopecloudappnet for more information

                      bull Avoid frequent large scans bull Parallelize queries bull Point queries are most efficient

                      bullTransactions across a single partition bullTransaction semantics amp Reduce round trips

                      Scalability

                      Query Efficiency amp Speed

                      Entity group transactions

                      Expect Continuation Tokens ndash Seriously

                      Maximum of 1000 rows in a response

                      At the end of partition range boundary

                      Maximum of 1000 rows in a response

                      At the end of partition range boundary

                      Maximum of 5 seconds to execute the query

                      Tables Recap bullEfficient for frequently used queries

                      bullSupports batch transactions

                      bullDistributes load

                      Select PartitionKey and RowKey that help scale

                      Avoid ldquoAppend onlyrdquo patterns

                      Always Handle continuation tokens

                      ldquoORrdquo predicates are not optimized

                      Implement back-off strategy for retries

                      bullDistribute by using a hash etc as prefix

                      bullExpect continuation tokens for range queries

                      bullExecute the queries that form the ldquoORrdquo predicates as separate queries

                      bullServer busy

                      bullLoad balance partitions to meet traffic needs

                      bullLoad on single partition has exceeded the limits

                      WCF Data Services

                      bullUse a new context for each logical operation

                      bullAddObjectAttachTo can throw exception if entity is already being tracked

                      bullPoint query throws an exception if resource does not exist Use IgnoreResourceNotFoundException

                      Queues Their Unique Role in Building Reliable Scalable Applications

                      bull Want roles that work closely together but are not bound together bull Tight coupling leads to brittleness

                      bull This can aid in scaling and performance

                      bull A queue can hold an unlimited number of messages bull Messages must be serializable as XML

                      bull Limited to 8KB in size

                      bull Commonly use the work ticket pattern

                      bull Why not simply use a table

                      Queue Terminology

                      Message Lifecycle

                      Queue

                      Msg 1

                      Msg 2

                      Msg 3

                      Msg 4

                      Worker Role

                      Worker Role

                      PutMessage

                      Web Role

                      GetMessage (Timeout) RemoveMessage

                      Msg 2 Msg 1

                      Worker Role

                      Msg 2

                      POST httpmyaccountqueuecorewindowsnetmyqueuemessages

                      HTTP11 200 OK Transfer-Encoding chunked Content-Type applicationxml Date Tue 09 Dec 2008 210430 GMT Server Nephos Queue Service Version 10 Microsoft-HTTPAPI20

                      ltxml version=10 encoding=utf-8gt ltQueueMessagesListgt ltQueueMessagegt ltMessageIdgt5974b586-0df3-4e2d-ad0c-18e3892bfca2ltMessageIdgt ltInsertionTimegtMon 22 Sep 2008 232920 GMTltInsertionTimegt ltExpirationTimegtMon 29 Sep 2008 232920 GMTltExpirationTimegt ltPopReceiptgtYzQ4Yzg1MDIGM0MDFiZDAwYzEwltPopReceiptgt ltTimeNextVisiblegtTue 23 Sep 2008 052920GMTltTimeNextVisiblegt ltMessageTextgtPHRlc3Q+dGdGVzdD4=ltMessageTextgt ltQueueMessagegt ltQueueMessagesListgt

                      DELETE httpmyaccountqueuecorewindowsnetmyqueuemessagesmessageidpopreceipt=YzQ4Yzg1MDIGM0MDFiZDAwYzEw

                      Truncated Exponential Back Off Polling

                      Consider a backoff polling approach Each empty poll

                      increases interval by 2x

                      A successful sets the interval back to 1

                      44

                      2 1

                      1 1

                      C1

                      C2

                      Removing Poison Messages

                      1 1

                      2 1

                      3 4 0

                      Producers Consumers

                      P2

                      P1

                      3 0

                      2 GetMessage(Q 30 s) msg 2

                      1 GetMessage(Q 30 s) msg 1

                      1 1

                      2 1

                      1 0

                      2 0

                      45

                      C1

                      C2

                      Removing Poison Messages

                      3 4 0

                      Producers Consumers

                      P2

                      P1

                      1 1

                      2 1

                      2 GetMessage(Q 30 s) msg 2 3 C2 consumed msg 2 4 DeleteMessage(Q msg 2) 7 GetMessage(Q 30 s) msg 1

                      1 GetMessage(Q 30 s) msg 1 5 C1 crashed

                      1 1

                      2 1

                      6 msg1 visible 30 s after Dequeue 3 0

                      1 2

                      1 1

                      1 2

                      46

                      C1

                      C2

                      Removing Poison Messages

                      3 4 0

                      Producers Consumers

                      P2

                      P1

                      1 2

                      2 Dequeue(Q 30 sec) msg 2 3 C2 consumed msg 2 4 Delete(Q msg 2) 7 Dequeue(Q 30 sec) msg 1 8 C2 crashed

                      1 Dequeue(Q 30 sec) msg 1 5 C1 crashed 10 C1 restarted 11 Dequeue(Q 30 sec) msg 1 12 DequeueCount gt 2 13 Delete (Q msg1) 1

                      2

                      6 msg1 visible 30s after Dequeue 9 msg1 visible 30s after Dequeue

                      3 0

                      1 3

                      1 2

                      1 3

                      Queues Recap

                      bullNo need to deal with failures Make message

                      processing idempotent

                      bullInvisible messages result in out of order Do not rely on order

                      bullEnforce threshold on messagersquos dequeue count Use Dequeue count to remove

                      poison messages

                      bullMessages gt 8KB

                      bullBatch messages

                      bullGarbage collect orphaned blobs

                      bullDynamically increasereduce workers

                      Use blob to store message data with

                      reference in message

                      Use message count to scale

                      bullNo need to deal with failures

                      bullInvisible messages result in out of order

                      bullEnforce threshold on messagersquos dequeue count

                      bullDynamically increasereduce workers

                      Windows Azure Storage Takeaways

                      Blobs

                      Drives

                      Tables

                      Queues

                      httpblogsmsdncomwindowsazurestorage

                      httpazurescopecloudappnet

                      49

                      A Quick Exercise

                      hellipThen letrsquos look at some code and some tools

                      50

                      Code ndash AccountInformationcs public class AccountInformation private static string storageKey = ldquotHiSiSnOtMyKeY private static string accountName = jjstore private static StorageCredentialsAccountAndKey credentials internal static StorageCredentialsAccountAndKey Credentials get if (credentials == null) credentials = new StorageCredentialsAccountAndKey(accountName storageKey) return credentials

                      51

                      Code ndash BlobHelpercs public class BlobHelper private static string defaultContainerName = school private CloudBlobClient client = null private CloudBlobContainer container = null private void InitContainer() if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudBlobClient() container = clientGetContainerReference(defaultContainerName) containerCreateIfNotExist() BlobContainerPermissions permissions = containerGetPermissions() permissionsPublicAccess = BlobContainerPublicAccessTypeContainer containerSetPermissions(permissions)

                      52

                      Code ndash BlobHelpercs

                      public void WriteFileToBlob(string filePath) if (client == null || container == null) InitContainer() FileInfo file = new FileInfo(filePath) CloudBlob blob = containerGetBlobReference(fileName) blobPropertiesContentType = GetContentType(fileExtension) blobUploadFile(fileFullName) Or if you want to write a string replace the last line with blobUploadText(someString) And make sure you set the content type to the appropriate MIME type (eg ldquotextplainrdquo)

                      53

                      Code ndash BlobHelpercs

                      public string GetBlobText(string blobName) if (client == null || container == null) InitContainer() CloudBlob blob = containerGetBlobReference(blobName) try return blobDownloadText() catch (Exception) The blob probably does not exist or there is no connection available return null

                      54

                      Application Code - Blobs private void SaveToCloudButton_Click(object sender RoutedEventArgs e) StringBuilder buff = new StringBuilder() buffAppendLine(LastNameFirstNameEmailBirthdayNativeLanguageFavoriteIceCreamYearsInPhDGraduated) foreach (AttendeeEntity attendee in attendees) buffAppendLine(attendeeToCsvString()) blobHelperWriteStringToBlob(SummerSchoolAttendeestxt buffToString())

                      The blob is now available at httpltAccountNamegtblobcorewindowsnetltContainerNamegtltBlobNamegt Or in this case httpjjstoreblobcorewindowsnetschoolSummerSchoolAttendeestxt

                      55

                      Code - TableEntities using MicrosoftWindowsAzureStorageClient public class AttendeeEntity TableServiceEntity public string FirstName get set public string LastName get set public string Email get set public DateTime Birthday get set public string FavoriteIceCream get set public int YearsInPhD get set public bool Graduated get set hellip

                      56

                      Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

                      57

                      Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

                      58

                      Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

                      59

                      Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

                      60

                      Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

                      Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

                      61

                      Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

                      62

                      Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

                      Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

                      63

                      Tools ndash Fiddler2

                      Best Practices

                      Picking the Right VM Size

                      bull Having the correct VM size can make a big difference in costs

                      bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                      bull If you scale better than linear across cores larger VMs could save you money

                      bull Pretty rare to see linear scaling across 8 cores

                      bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                      bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                      Using Your VM to the Maximum

                      Remember bull 1 role instance == 1 VM running Windows

                      bull 1 role instance = one specific task for your code

                      bull Yoursquore paying for the entire VM so why not use it

                      bull Common mistake ndash split up code into multiple roles each not using up CPU

                      bull Balance between using up CPU vs having free capacity in times of need

                      bull Multiple ways to use your CPU to the fullest

                      Exploiting Concurrency

                      bull Spin up additional processes each with a specific task or as a unit of concurrency

                      bull May not be ideal if number of active processes exceeds number of cores

                      bull Use multithreading aggressively

                      bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                      bull In NET 4 use the Task Parallel Library

                      bull Data parallelism

                      bull Task parallelism

                      Finding Good Code Neighbors

                      bull Typically code falls into one or more of these categories

                      bull Find code that is intensive with different resources to live together

                      bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                      Memory Intensive

                      CPU Intensive

                      Network IO Intensive

                      Storage IO Intensive

                      Scaling Appropriately

                      bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                      bull Spinning VMs up and down automatically is good at large scale

                      bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                      bull Being too aggressive in spinning down VMs can result in poor user experience

                      bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                      Performance Cost

                      Storage Costs

                      bull Understand an applicationrsquos storage profile and how storage billing works

                      bull Make service choices based on your app profile

                      bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                      bull Service choice can make a big cost difference based on your app profile

                      bull Caching and compressing They help a lot with storage costs

                      Saving Bandwidth Costs

                      Bandwidth costs are a huge part of any popular web apprsquos billing profile

                      Sending fewer things over the wire often means getting fewer things from storage

                      Saving bandwidth costs often lead to savings in other places

                      Sending fewer things means your VM has time to do other tasks

                      All of these tips have the side benefit of improving your web apprsquos performance and user experience

                      Compressing Content

                      1 Gzip all output content

                      bull All modern browsers can decompress on the fly

                      bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                      2Tradeoff compute costs for storage size

                      3Minimize image sizes

                      bull Use Portable Network Graphics (PNGs)

                      bull Crush your PNGs

                      bull Strip needless metadata

                      bull Make all PNGs palette PNGs

                      Uncompressed

                      Content

                      Compressed

                      Content

                      Gzip

                      Minify JavaScript

                      Minify CCS

                      Minify Images

                      Best Practices Summary

                      Doing lsquolessrsquo is the key to saving costs

                      Measure everything

                      Know your application profile in and out

                      Research Examples in the Cloud

                      hellipon another set of slides

                      Map Reduce on Azure

                      bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                      bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                      76

                      Project Daytona - Map Reduce on Azure

                      httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                      77

                      Questions and Discussionhellip

                      Thank you for hosting me at the Summer School

                      BLAST (Basic Local Alignment Search Tool)

                      bull The most important software in bioinformatics

                      bull Identify similarity between bio-sequences

                      Computationally intensive

                      bull Large number of pairwise alignment operations

                      bull A BLAST running can take 700 ~ 1000 CPU hours

                      bull Sequence databases growing exponentially

                      bull GenBank doubled in size in about 15 months

                      It is easy to parallelize BLAST

                      bull Segment the input

                      bull Segment processing (querying) is pleasingly parallel

                      bull Segment the database (eg mpiBLAST)

                      bull Needs special result reduction processing

                      Large volume data

                      bull A normal Blast database can be as large as 10GB

                      bull 100 nodes means the peak storage bandwidth could reach to 1TB

                      bull The output of BLAST is usually 10-100x larger than the input

                      bull Parallel BLAST engine on Azure

                      bull Query-segmentation data-parallel pattern bull split the input sequences

                      bull query partitions in parallel

                      bull merge results together when done

                      bull Follows the general suggested application model bull Web Role + Queue + Worker

                      bull With three special considerations bull Batch job management

                      bull Task parallelism on an elastic Cloud

                      Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                      on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                      A simple SplitJoin pattern

                      Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                      bull 1248 for small middle large and extra large instance size

                      Task granularity bull Large partition load imbalance

                      bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                      bull Data transferring overhead

                      Best Practice test runs to profiling and set size to mitigate the overhead

                      Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                      bull too small repeated computation

                      bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                      bull Estimate the value based on the number of pair-bases in the partition and test-runs

                      bull Watch out for the 2-hour maximum limitation

                      BLAST task

                      Splitting task

                      BLAST task

                      BLAST task

                      BLAST task

                      hellip

                      Merging Task

                      Task size vs Performance

                      bull Benefit of the warm cache effect

                      bull 100 sequences per partition is the best choice

                      Instance size vs Performance

                      bull Super-linear speedup with larger size worker instances

                      bull Primarily due to the memory capability

                      Task SizeInstance Size vs Cost

                      bull Extra-large instance generated the best and the most economical throughput

                      bull Fully utilize the resource

                      Web

                      Portal

                      Web

                      Service

                      Job registration

                      Job Scheduler

                      Worker

                      Worker

                      Worker

                      Global

                      dispatch

                      queue

                      Web Role

                      Azure Table

                      Job Management Role

                      Azure Blob

                      Database

                      updating Role

                      hellip

                      Scaling Engine

                      Blast databases

                      temporary data etc)

                      Job Registry NCBI databases

                      BLAST task

                      Splitting task

                      BLAST task

                      BLAST task

                      BLAST task

                      hellip

                      Merging Task

                      ASPNET program hosted by a web role instance bull Submit jobs

                      bull Track jobrsquos status and logs

                      AuthenticationAuthorization based on Live ID

                      The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                      Web Portal

                      Web Service

                      Job registration

                      Job Scheduler

                      Job Portal

                      Scaling Engine

                      Job Registry

                      R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                      Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                      bull Against ~5000 proteins from another strain completed in less than 30 sec

                      AzureBLAST significantly saved computing timehellip

                      Discovering Homologs bull Discover the interrelationships of known protein sequences

                      ldquoAll against Allrdquo query bull The database is also the input query

                      bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                      bull Theoretically 100 billion sequence comparisons

                      Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                      bull Would require 3216731 minutes (61 years) on one desktop

                      One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                      bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                      bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                      bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                      bull Each segment consists of smaller partitions

                      bull When load imbalances redistribute the load manually

                      5

                      0

                      62 6

                      2 6

                      2 6

                      2 6

                      2 5

                      0 62

                      bull Total size of the output result is ~230GB

                      bull The number of total hits is 1764579487

                      bull Started at March 25th the last task completed on April 8th (10 days compute)

                      bull But based our estimates real working instance time should be 6~8 day

                      bull Look into log data to analyze what took placehellip

                      5

                      0

                      62 6

                      2 6

                      2 6

                      2 6

                      2 5

                      0 62

                      A normal log record should be

                      Otherwise something is wrong (eg task failed to complete)

                      3312010 614 RD00155D3611B0 Executing the task 251523

                      3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                      3312010 625 RD00155D3611B0 Executing the task 251553

                      3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                      3312010 644 RD00155D3611B0 Executing the task 251600

                      3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                      3312010 822 RD00155D3611B0 Executing the task 251774

                      3312010 950 RD00155D3611B0 Executing the task 251895

                      3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                      North Europe Data Center totally 34256 tasks processed

                      All 62 compute nodes lost tasks

                      and then came back in a group

                      This is an Update domain

                      ~30 mins

                      ~ 6 nodes in one group

                      35 Nodes experience blob

                      writing failure at same

                      time

                      West Europe Datacenter 30976 tasks are completed and job was killed

                      A reasonable guess the

                      Fault Domain is working

                      MODISAzure Computing Evapotranspiration (ET) in The Cloud

                      You never miss the water till the well has run dry Irish Proverb

                      ET = Water volume evapotranspired (m3 s-1 m-2)

                      Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                      λv = Latent heat of vaporization (Jg)

                      Rn = Net radiation (W m-2)

                      cp = Specific heat capacity of air (J kg-1 K-1)

                      ρa = dry air density (kg m-3)

                      δq = vapor pressure deficit (Pa)

                      ga = Conductivity of air (inverse of ra) (m s-1)

                      gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                      γ = Psychrometric constant (γ asymp 66 Pa K-1)

                      Estimating resistanceconductivity across a

                      catchment can be tricky

                      bull Lots of inputs big data reduction

                      bull Some of the inputs are not so simple

                      119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                      (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                      Penman-Monteith (1964)

                      Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                      NASA MODIS

                      imagery source

                      archives

                      5 TB (600K files)

                      FLUXNET curated

                      sensor dataset

                      (30GB 960 files)

                      FLUXNET curated

                      field dataset

                      2 KB (1 file)

                      NCEPNCAR

                      ~100MB

                      (4K files)

                      Vegetative

                      clumping

                      ~5MB (1file)

                      Climate

                      classification

                      ~1MB (1file)

                      20 US year = 1 global year

                      Data collection (map) stage

                      bull Downloads requested input tiles

                      from NASA ftp sites

                      bull Includes geospatial lookup for

                      non-sinusoidal tiles that will

                      contribute to a reprojected

                      sinusoidal tile

                      Reprojection (map) stage

                      bull Converts source tile(s) to

                      intermediate result sinusoidal tiles

                      bull Simple nearest neighbor or spline

                      algorithms

                      Derivation reduction stage

                      bull First stage visible to scientist

                      bull Computes ET in our initial use

                      Analysis reduction stage

                      bull Optional second stage visible to

                      scientist

                      bull Enables production of science

                      analysis artifacts such as maps

                      tables virtual sensors

                      Reduction 1

                      Queue

                      Source

                      Metadata

                      AzureMODIS

                      Service Web Role Portal

                      Request

                      Queue

                      Scientific

                      Results

                      Download

                      Data Collection Stage

                      Source Imagery Download Sites

                      Reprojection

                      Queue

                      Reduction 2

                      Queue

                      Download

                      Queue

                      Scientists

                      Science results

                      Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                      httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                      bull ModisAzure Service is the Web Role front door bull Receives all user requests

                      bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                      bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                      recoverable units of work

                      bull Execution status of all jobs and tasks persisted in Tables

                      ltPipelineStagegt

                      Request

                      hellip ltPipelineStagegtJobStatus

                      Persist ltPipelineStagegtJob Queue

                      MODISAzure Service

                      (Web Role)

                      Service Monitor

                      (Worker Role)

                      Parse amp Persist ltPipelineStagegtTaskStatus

                      hellip

                      Dispatch

                      ltPipelineStagegtTask Queue

                      All work actually done by a Worker Role

                      bull Sandboxes science or other executable

                      bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                      Service Monitor

                      (Worker Role)

                      Parse amp Persist ltPipelineStagegtTaskStatus

                      GenericWorker

                      (Worker Role)

                      hellip

                      hellip

                      Dispatch

                      ltPipelineStagegtTask Queue

                      hellip

                      ltInputgtData Storage

                      bull Dequeues tasks created by the Service Monitor

                      bull Retries failed tasks 3 times

                      bull Maintains all task status

                      Reprojection Request

                      hellip

                      Service Monitor

                      (Worker Role)

                      ReprojectionJobStatus Persist

                      Parse amp Persist ReprojectionTaskStatus

                      GenericWorker

                      (Worker Role)

                      hellip

                      Job Queue

                      hellip

                      Dispatch

                      Task Queue

                      Points to

                      hellip

                      ScanTimeList

                      SwathGranuleMeta

                      Reprojection Data

                      Storage

                      Each entity specifies a

                      single reprojection job

                      request

                      Each entity specifies a

                      single reprojection task (ie

                      a single tile)

                      Query this table to get

                      geo-metadata (eg

                      boundaries) for each swath

                      tile

                      Query this table to get the

                      list of satellite scan times

                      that cover a target tile

                      Swath Source

                      Data Storage

                      bull Computational costs driven by data scale and need to run reduction multiple times

                      bull Storage costs driven by data scale and 6 month project duration

                      bull Small with respect to the people costs even at graduate student rates

                      Reduction 1 Queue

                      Source Metadata

                      Request Queue

                      Scientific Results Download

                      Data Collection Stage

                      Source Imagery Download Sites

                      Reprojection Queue

                      Reduction 2 Queue

                      Download

                      Queue

                      Scientists

                      Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                      400-500 GB

                      60K files

                      10 MBsec

                      11 hours

                      lt10 workers

                      $50 upload

                      $450 storage

                      400 GB

                      45K files

                      3500 hours

                      20-100

                      workers

                      5-7 GB

                      55K files

                      1800 hours

                      20-100

                      workers

                      lt10 GB

                      ~1K files

                      1800 hours

                      20-100

                      workers

                      $420 cpu

                      $60 download

                      $216 cpu

                      $1 download

                      $6 storage

                      $216 cpu

                      $2 download

                      $9 storage

                      AzureMODIS

                      Service Web Role Portal

                      Total $1420

                      bull Clouds are the largest scale computer centers ever constructed and have the

                      potential to be important to both large and small scale science problems

                      bull Equally import they can increase participation in research providing needed

                      resources to userscommunities without ready access

                      bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                      support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                      latency applications do not perform optimally on clouds today

                      bull Provide valuable fault tolerance and scalability abstractions

                      bull Clouds as amplifier for familiar client tools and on premise compute

                      bull Clouds services to support research provide considerable leverage for both

                      individual researchers and entire communities of researchers

                      • Day 2 - Azure as a PaaS
                      • Day 2 - Applications

                        Key Components ndash Compute Web Roles

                        Web Front End

                        bull Cloud web server

                        bull Web pages

                        bull Web services

                        You can create the following types

                        bull ASPNET web roles

                        bull ASPNET MVC 2 web roles

                        bull WCF service web roles

                        bull Worker roles

                        bull CGI-based web roles

                        Key Components ndash Compute Worker Roles

                        bull Utility compute

                        bull Windows Server 2008

                        bull Background processing

                        bull Each role can define an amount of local storage

                        bull Protected space on the local drive considered volatile storage

                        bull May communicate with outside services

                        bull Azure Storage

                        bull SQL Azure

                        bull Other Web services

                        bull Can expose external and internal endpoints

                        Suggested Application Model Using queues for reliable messaging

                        Scalable Fault Tolerant Applications

                        Queues are the application glue bull Decouple parts of application easier to scale independently bull Resource allocation different priority queues and backend servers bull Mask faults in worker roles (reliable messaging)

                        Key Components ndash Compute VM Roles

                        bull Customized Role

                        bull You own the box

                        bull How it works

                        bull Download ldquoGuest OSrdquo to Server 2008 Hyper-V

                        bull Customize the OS as you need to

                        bull Upload the differences VHD

                        bull Azure runs your VM role using

                        bull Base OS

                        bull Differences VHD

                        Application Hosting

                        lsquoGrokkingrsquo the service model

                        bull Imagine white-boarding out your service architecture with boxes for nodes and arrows describing how they communicate

                        bull The service model is the same diagram written down in a declarative format

                        bull You give the Fabric the service model and the binaries that go with each of those nodes

                        bull The Fabric can provision deploy and manage that diagram for you

                        bull Find hardware home

                        bull Copy and launch your app binaries

                        bull Monitor your app and the hardware

                        bull In case of failure take action Perhaps even relocate your app

                        bull At all times the lsquodiagramrsquo stays whole

                        Automated Service Management Provide code + service model

                        bull Platform identifies and allocates resources deploys the service manages service health

                        bull Configuration is handled by two files

                        ServiceDefinitioncsdef

                        ServiceConfigurationcscfg

                        Service Definition

                        Service Configuration

                        GUI

                        Double click on Role Name in Azure Project

                        Deploying to the cloud

                        bull We can deploy from the portal or from script

                        bull VS builds two files

                        bull Encrypted package of your code

                        bull Your config file

                        bull You must create an Azure account then a service and then you deploy your code

                        bull Can take up to 20 minutes

                        bull (which is better than six months)

                        Service Management API

                        bullREST based API to manage your services

                        bullX509-certs for authentication

                        bullLets you create delete change upgrade swaphellip

                        bullLots of community and MSFT-built tools around the API

                        - Easy to roll your own

                        The Secret Sauce ndash The Fabric

                        The Fabric is the lsquobrainrsquo behind Windows Azure

                        1 Process service model

                        1 Determine resource requirements

                        2 Create role images

                        2 Allocate resources

                        3 Prepare nodes

                        1 Place role images on nodes

                        2 Configure settings

                        3 Start roles

                        4 Configure load balancers

                        5 Maintain service health

                        1 If role fails restart the role based on policy

                        2 If node fails migrate the role based on policy

                        Storage

                        Durable Storage At Massive Scale

                        Blob

                        - Massive files eg videos logs

                        Drive

                        - Use standard file system APIs

                        Tables - Non-relational but with few scale limits

                        - Use SQL Azure for relational data

                        Queues

                        - Facilitate loosely-coupled reliable systems

                        Blob Features and Functions

                        bull Store Large Objects (up to 1TB in size)

                        bull Can be served through Windows Azure CDN service

                        bull Standard REST Interface

                        bull PutBlob

                        bull Inserts a new blob overwrites the existing blob

                        bull GetBlob

                        bull Get whole blob or a specific range

                        bull DeleteBlob

                        bull CopyBlob

                        bull SnapshotBlob

                        bull LeaseBlob

                        Two Types of Blobs Under the Hood

                        bull Block Blob

                        bull Targeted at streaming workloads

                        bull Each blob consists of a sequence of blocks bull Each block is identified by a Block ID

                        bull Size limit 200GB per blob

                        bull Page Blob

                        bull Targeted at random readwrite workloads

                        bull Each blob consists of an array of pages bull Each page is identified by its offset

                        from the start of the blob

                        bull Size limit 1TB per blob

                        Windows Azure Drive

                        bull Provides a durable NTFS volume for Windows Azure applications to use

                        bull Use existing NTFS APIs to access a durable drive bull Durability and survival of data on application failover

                        bull Enables migrating existing NTFS applications to the cloud

                        bull A Windows Azure Drive is a Page Blob

                        bull Example mount Page Blob as X bull httpltaccountnamegtblobcorewindowsnetltcontainernamegtltblobnamegt

                        bull All writes to drive are made durable to the Page Blob bull Drive made durable through standard Page Blob replication

                        bull Drive persists even when not mounted as a Page Blob

                        Windows Azure Tables

                        bull Provides Structured Storage

                        bull Massively Scalable Tables bull Billions of entities (rows) and TBs of data

                        bull Can use thousands of servers as traffic grows

                        bull Highly Available amp Durable bull Data is replicated several times

                        bull Familiar and Easy to use API

                        bull WCF Data Services and OData bull NET classes and LINQ

                        bull REST ndash with any platform or language

                        Windows Azure Queues

                        bull Queue are performance efficient highly available and provide reliable message delivery

                        bull Simple asynchronous work dispatch

                        bull Programming semantics ensure that a message can be processed at least once

                        bull Access is provided via REST

                        Storage Partitioning

                        Understanding partitioning is key to understanding performance

                        bull Different for each data type (blobs entities queues) Every data object has a partition key

                        bull A partition can be served by a single server

                        bull System load balances partitions based on traffic pattern

                        bull Controls entity locality Partition key is unit of scale

                        bull Load balancing can take a few minutes to kick in

                        bull Can take a couple of seconds for partition to be available on a different server

                        System load balances

                        bull Use exponential backoff on ldquoServer Busyrdquo

                        bull Our system load balances to meet your traffic needs

                        bull Single partition limits have been reached Server Busy

                        Partition Keys In Each Abstraction

                        bull Entities w same PartitionKey value served from same partition Entities ndash TableName + PartitionKey

                        PartitionKey (CustomerId) RowKey (RowKind)

                        Name CreditCardNumber OrderTotal

                        1 Customer-John Smith John Smith xxxx-xxxx-xxxx-xxxx

                        1 Order ndash 1 $3512

                        2 Customer-Bill Johnson Bill Johnson xxxx-xxxx-xxxx-xxxx

                        2 Order ndash 3 $1000

                        bull Every blob and its snapshots are in a single partition Blobs ndash Container name + Blob name

                        bullAll messages for a single queue belong to the same partition Messages ndash Queue Name

                        Container Name Blob Name

                        image annarborbighousejpg

                        image foxboroughgillettejpg

                        video annarborbighousejpg

                        Queue Message

                        jobs Message1

                        jobs Message2

                        workflow Message1

                        Scalability Targets

                        Storage Account

                        bull Capacity ndash Up to 100 TBs

                        bull Transactions ndash Up to a few thousand requests per second

                        bull Bandwidth ndash Up to a few hundred megabytes per second

                        Single QueueTable Partition

                        bull Up to 500 transactions per second

                        To go above these numbers partition between multiple storage accounts and partitions

                        When limit is hit app will see lsquo503 server busyrsquo applications should implement exponential backoff

                        Single Blob Partition

                        bull Throughput up to 60 MBs

                        PartitionKey (Category)

                        RowKey (Title)

                        Timestamp ReleaseDate

                        Action Fast amp Furious hellip 2009

                        Action The Bourne Ultimatum hellip 2007

                        hellip hellip hellip hellip

                        Animation Open Season 2 hellip 2009

                        Animation The Ant Bully hellip 2006

                        PartitionKey (Category)

                        RowKey (Title)

                        Timestamp ReleaseDate

                        Comedy Office Space hellip 1999

                        hellip hellip hellip hellip

                        SciFi X-Men Origins Wolverine hellip 2009

                        hellip hellip hellip hellip

                        War Defiance hellip 2008

                        PartitionKey (Category)

                        RowKey (Title)

                        Timestamp ReleaseDate

                        Action Fast amp Furious hellip 2009

                        Action The Bourne Ultimatum hellip 2007

                        hellip hellip hellip hellip

                        Animation Open Season 2 hellip 2009

                        Animation The Ant Bully hellip 2006

                        hellip hellip hellip hellip

                        Comedy Office Space hellip 1999

                        hellip hellip hellip hellip

                        SciFi X-Men Origins Wolverine hellip 2009

                        hellip hellip hellip hellip

                        War Defiance hellip 2008

                        Partitions and Partition Ranges

                        Key Selection Things to Consider

                        bullDistribute load as much as possible bullHot partitions can be load balanced bullPartitionKey is critical for scalability

                        See httpwwwmicrosoftpdccom2009SVC09 and httpazurescopecloudappnet for more information

                        bull Avoid frequent large scans bull Parallelize queries bull Point queries are most efficient

                        bullTransactions across a single partition bullTransaction semantics amp Reduce round trips

                        Scalability

                        Query Efficiency amp Speed

                        Entity group transactions

                        Expect Continuation Tokens ndash Seriously

                        Maximum of 1000 rows in a response

                        At the end of partition range boundary

                        Maximum of 1000 rows in a response

                        At the end of partition range boundary

                        Maximum of 5 seconds to execute the query

                        Tables Recap bullEfficient for frequently used queries

                        bullSupports batch transactions

                        bullDistributes load

                        Select PartitionKey and RowKey that help scale

                        Avoid ldquoAppend onlyrdquo patterns

                        Always Handle continuation tokens

                        ldquoORrdquo predicates are not optimized

                        Implement back-off strategy for retries

                        bullDistribute by using a hash etc as prefix

                        bullExpect continuation tokens for range queries

                        bullExecute the queries that form the ldquoORrdquo predicates as separate queries

                        bullServer busy

                        bullLoad balance partitions to meet traffic needs

                        bullLoad on single partition has exceeded the limits

                        WCF Data Services

                        bullUse a new context for each logical operation

                        bullAddObjectAttachTo can throw exception if entity is already being tracked

                        bullPoint query throws an exception if resource does not exist Use IgnoreResourceNotFoundException

                        Queues Their Unique Role in Building Reliable Scalable Applications

                        bull Want roles that work closely together but are not bound together bull Tight coupling leads to brittleness

                        bull This can aid in scaling and performance

                        bull A queue can hold an unlimited number of messages bull Messages must be serializable as XML

                        bull Limited to 8KB in size

                        bull Commonly use the work ticket pattern

                        bull Why not simply use a table

                        Queue Terminology

                        Message Lifecycle

                        Queue

                        Msg 1

                        Msg 2

                        Msg 3

                        Msg 4

                        Worker Role

                        Worker Role

                        PutMessage

                        Web Role

                        GetMessage (Timeout) RemoveMessage

                        Msg 2 Msg 1

                        Worker Role

                        Msg 2

                        POST httpmyaccountqueuecorewindowsnetmyqueuemessages

                        HTTP11 200 OK Transfer-Encoding chunked Content-Type applicationxml Date Tue 09 Dec 2008 210430 GMT Server Nephos Queue Service Version 10 Microsoft-HTTPAPI20

                        ltxml version=10 encoding=utf-8gt ltQueueMessagesListgt ltQueueMessagegt ltMessageIdgt5974b586-0df3-4e2d-ad0c-18e3892bfca2ltMessageIdgt ltInsertionTimegtMon 22 Sep 2008 232920 GMTltInsertionTimegt ltExpirationTimegtMon 29 Sep 2008 232920 GMTltExpirationTimegt ltPopReceiptgtYzQ4Yzg1MDIGM0MDFiZDAwYzEwltPopReceiptgt ltTimeNextVisiblegtTue 23 Sep 2008 052920GMTltTimeNextVisiblegt ltMessageTextgtPHRlc3Q+dGdGVzdD4=ltMessageTextgt ltQueueMessagegt ltQueueMessagesListgt

                        DELETE httpmyaccountqueuecorewindowsnetmyqueuemessagesmessageidpopreceipt=YzQ4Yzg1MDIGM0MDFiZDAwYzEw

                        Truncated Exponential Back Off Polling

                        Consider a backoff polling approach Each empty poll

                        increases interval by 2x

                        A successful sets the interval back to 1

                        44

                        2 1

                        1 1

                        C1

                        C2

                        Removing Poison Messages

                        1 1

                        2 1

                        3 4 0

                        Producers Consumers

                        P2

                        P1

                        3 0

                        2 GetMessage(Q 30 s) msg 2

                        1 GetMessage(Q 30 s) msg 1

                        1 1

                        2 1

                        1 0

                        2 0

                        45

                        C1

                        C2

                        Removing Poison Messages

                        3 4 0

                        Producers Consumers

                        P2

                        P1

                        1 1

                        2 1

                        2 GetMessage(Q 30 s) msg 2 3 C2 consumed msg 2 4 DeleteMessage(Q msg 2) 7 GetMessage(Q 30 s) msg 1

                        1 GetMessage(Q 30 s) msg 1 5 C1 crashed

                        1 1

                        2 1

                        6 msg1 visible 30 s after Dequeue 3 0

                        1 2

                        1 1

                        1 2

                        46

                        C1

                        C2

                        Removing Poison Messages

                        3 4 0

                        Producers Consumers

                        P2

                        P1

                        1 2

                        2 Dequeue(Q 30 sec) msg 2 3 C2 consumed msg 2 4 Delete(Q msg 2) 7 Dequeue(Q 30 sec) msg 1 8 C2 crashed

                        1 Dequeue(Q 30 sec) msg 1 5 C1 crashed 10 C1 restarted 11 Dequeue(Q 30 sec) msg 1 12 DequeueCount gt 2 13 Delete (Q msg1) 1

                        2

                        6 msg1 visible 30s after Dequeue 9 msg1 visible 30s after Dequeue

                        3 0

                        1 3

                        1 2

                        1 3

                        Queues Recap

                        bullNo need to deal with failures Make message

                        processing idempotent

                        bullInvisible messages result in out of order Do not rely on order

                        bullEnforce threshold on messagersquos dequeue count Use Dequeue count to remove

                        poison messages

                        bullMessages gt 8KB

                        bullBatch messages

                        bullGarbage collect orphaned blobs

                        bullDynamically increasereduce workers

                        Use blob to store message data with

                        reference in message

                        Use message count to scale

                        bullNo need to deal with failures

                        bullInvisible messages result in out of order

                        bullEnforce threshold on messagersquos dequeue count

                        bullDynamically increasereduce workers

                        Windows Azure Storage Takeaways

                        Blobs

                        Drives

                        Tables

                        Queues

                        httpblogsmsdncomwindowsazurestorage

                        httpazurescopecloudappnet

                        49

                        A Quick Exercise

                        hellipThen letrsquos look at some code and some tools

                        50

                        Code ndash AccountInformationcs public class AccountInformation private static string storageKey = ldquotHiSiSnOtMyKeY private static string accountName = jjstore private static StorageCredentialsAccountAndKey credentials internal static StorageCredentialsAccountAndKey Credentials get if (credentials == null) credentials = new StorageCredentialsAccountAndKey(accountName storageKey) return credentials

                        51

                        Code ndash BlobHelpercs public class BlobHelper private static string defaultContainerName = school private CloudBlobClient client = null private CloudBlobContainer container = null private void InitContainer() if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudBlobClient() container = clientGetContainerReference(defaultContainerName) containerCreateIfNotExist() BlobContainerPermissions permissions = containerGetPermissions() permissionsPublicAccess = BlobContainerPublicAccessTypeContainer containerSetPermissions(permissions)

                        52

                        Code ndash BlobHelpercs

                        public void WriteFileToBlob(string filePath) if (client == null || container == null) InitContainer() FileInfo file = new FileInfo(filePath) CloudBlob blob = containerGetBlobReference(fileName) blobPropertiesContentType = GetContentType(fileExtension) blobUploadFile(fileFullName) Or if you want to write a string replace the last line with blobUploadText(someString) And make sure you set the content type to the appropriate MIME type (eg ldquotextplainrdquo)

                        53

                        Code ndash BlobHelpercs

                        public string GetBlobText(string blobName) if (client == null || container == null) InitContainer() CloudBlob blob = containerGetBlobReference(blobName) try return blobDownloadText() catch (Exception) The blob probably does not exist or there is no connection available return null

                        54

                        Application Code - Blobs private void SaveToCloudButton_Click(object sender RoutedEventArgs e) StringBuilder buff = new StringBuilder() buffAppendLine(LastNameFirstNameEmailBirthdayNativeLanguageFavoriteIceCreamYearsInPhDGraduated) foreach (AttendeeEntity attendee in attendees) buffAppendLine(attendeeToCsvString()) blobHelperWriteStringToBlob(SummerSchoolAttendeestxt buffToString())

                        The blob is now available at httpltAccountNamegtblobcorewindowsnetltContainerNamegtltBlobNamegt Or in this case httpjjstoreblobcorewindowsnetschoolSummerSchoolAttendeestxt

                        55

                        Code - TableEntities using MicrosoftWindowsAzureStorageClient public class AttendeeEntity TableServiceEntity public string FirstName get set public string LastName get set public string Email get set public DateTime Birthday get set public string FavoriteIceCream get set public int YearsInPhD get set public bool Graduated get set hellip

                        56

                        Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

                        57

                        Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

                        58

                        Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

                        59

                        Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

                        60

                        Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

                        Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

                        61

                        Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

                        62

                        Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

                        Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

                        63

                        Tools ndash Fiddler2

                        Best Practices

                        Picking the Right VM Size

                        bull Having the correct VM size can make a big difference in costs

                        bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                        bull If you scale better than linear across cores larger VMs could save you money

                        bull Pretty rare to see linear scaling across 8 cores

                        bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                        bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                        Using Your VM to the Maximum

                        Remember bull 1 role instance == 1 VM running Windows

                        bull 1 role instance = one specific task for your code

                        bull Yoursquore paying for the entire VM so why not use it

                        bull Common mistake ndash split up code into multiple roles each not using up CPU

                        bull Balance between using up CPU vs having free capacity in times of need

                        bull Multiple ways to use your CPU to the fullest

                        Exploiting Concurrency

                        bull Spin up additional processes each with a specific task or as a unit of concurrency

                        bull May not be ideal if number of active processes exceeds number of cores

                        bull Use multithreading aggressively

                        bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                        bull In NET 4 use the Task Parallel Library

                        bull Data parallelism

                        bull Task parallelism

                        Finding Good Code Neighbors

                        bull Typically code falls into one or more of these categories

                        bull Find code that is intensive with different resources to live together

                        bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                        Memory Intensive

                        CPU Intensive

                        Network IO Intensive

                        Storage IO Intensive

                        Scaling Appropriately

                        bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                        bull Spinning VMs up and down automatically is good at large scale

                        bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                        bull Being too aggressive in spinning down VMs can result in poor user experience

                        bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                        Performance Cost

                        Storage Costs

                        bull Understand an applicationrsquos storage profile and how storage billing works

                        bull Make service choices based on your app profile

                        bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                        bull Service choice can make a big cost difference based on your app profile

                        bull Caching and compressing They help a lot with storage costs

                        Saving Bandwidth Costs

                        Bandwidth costs are a huge part of any popular web apprsquos billing profile

                        Sending fewer things over the wire often means getting fewer things from storage

                        Saving bandwidth costs often lead to savings in other places

                        Sending fewer things means your VM has time to do other tasks

                        All of these tips have the side benefit of improving your web apprsquos performance and user experience

                        Compressing Content

                        1 Gzip all output content

                        bull All modern browsers can decompress on the fly

                        bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                        2Tradeoff compute costs for storage size

                        3Minimize image sizes

                        bull Use Portable Network Graphics (PNGs)

                        bull Crush your PNGs

                        bull Strip needless metadata

                        bull Make all PNGs palette PNGs

                        Uncompressed

                        Content

                        Compressed

                        Content

                        Gzip

                        Minify JavaScript

                        Minify CCS

                        Minify Images

                        Best Practices Summary

                        Doing lsquolessrsquo is the key to saving costs

                        Measure everything

                        Know your application profile in and out

                        Research Examples in the Cloud

                        hellipon another set of slides

                        Map Reduce on Azure

                        bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                        bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                        76

                        Project Daytona - Map Reduce on Azure

                        httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                        77

                        Questions and Discussionhellip

                        Thank you for hosting me at the Summer School

                        BLAST (Basic Local Alignment Search Tool)

                        bull The most important software in bioinformatics

                        bull Identify similarity between bio-sequences

                        Computationally intensive

                        bull Large number of pairwise alignment operations

                        bull A BLAST running can take 700 ~ 1000 CPU hours

                        bull Sequence databases growing exponentially

                        bull GenBank doubled in size in about 15 months

                        It is easy to parallelize BLAST

                        bull Segment the input

                        bull Segment processing (querying) is pleasingly parallel

                        bull Segment the database (eg mpiBLAST)

                        bull Needs special result reduction processing

                        Large volume data

                        bull A normal Blast database can be as large as 10GB

                        bull 100 nodes means the peak storage bandwidth could reach to 1TB

                        bull The output of BLAST is usually 10-100x larger than the input

                        bull Parallel BLAST engine on Azure

                        bull Query-segmentation data-parallel pattern bull split the input sequences

                        bull query partitions in parallel

                        bull merge results together when done

                        bull Follows the general suggested application model bull Web Role + Queue + Worker

                        bull With three special considerations bull Batch job management

                        bull Task parallelism on an elastic Cloud

                        Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                        on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                        A simple SplitJoin pattern

                        Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                        bull 1248 for small middle large and extra large instance size

                        Task granularity bull Large partition load imbalance

                        bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                        bull Data transferring overhead

                        Best Practice test runs to profiling and set size to mitigate the overhead

                        Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                        bull too small repeated computation

                        bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                        bull Estimate the value based on the number of pair-bases in the partition and test-runs

                        bull Watch out for the 2-hour maximum limitation

                        BLAST task

                        Splitting task

                        BLAST task

                        BLAST task

                        BLAST task

                        hellip

                        Merging Task

                        Task size vs Performance

                        bull Benefit of the warm cache effect

                        bull 100 sequences per partition is the best choice

                        Instance size vs Performance

                        bull Super-linear speedup with larger size worker instances

                        bull Primarily due to the memory capability

                        Task SizeInstance Size vs Cost

                        bull Extra-large instance generated the best and the most economical throughput

                        bull Fully utilize the resource

                        Web

                        Portal

                        Web

                        Service

                        Job registration

                        Job Scheduler

                        Worker

                        Worker

                        Worker

                        Global

                        dispatch

                        queue

                        Web Role

                        Azure Table

                        Job Management Role

                        Azure Blob

                        Database

                        updating Role

                        hellip

                        Scaling Engine

                        Blast databases

                        temporary data etc)

                        Job Registry NCBI databases

                        BLAST task

                        Splitting task

                        BLAST task

                        BLAST task

                        BLAST task

                        hellip

                        Merging Task

                        ASPNET program hosted by a web role instance bull Submit jobs

                        bull Track jobrsquos status and logs

                        AuthenticationAuthorization based on Live ID

                        The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                        Web Portal

                        Web Service

                        Job registration

                        Job Scheduler

                        Job Portal

                        Scaling Engine

                        Job Registry

                        R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                        Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                        bull Against ~5000 proteins from another strain completed in less than 30 sec

                        AzureBLAST significantly saved computing timehellip

                        Discovering Homologs bull Discover the interrelationships of known protein sequences

                        ldquoAll against Allrdquo query bull The database is also the input query

                        bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                        bull Theoretically 100 billion sequence comparisons

                        Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                        bull Would require 3216731 minutes (61 years) on one desktop

                        One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                        bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                        bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                        bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                        bull Each segment consists of smaller partitions

                        bull When load imbalances redistribute the load manually

                        5

                        0

                        62 6

                        2 6

                        2 6

                        2 6

                        2 5

                        0 62

                        bull Total size of the output result is ~230GB

                        bull The number of total hits is 1764579487

                        bull Started at March 25th the last task completed on April 8th (10 days compute)

                        bull But based our estimates real working instance time should be 6~8 day

                        bull Look into log data to analyze what took placehellip

                        5

                        0

                        62 6

                        2 6

                        2 6

                        2 6

                        2 5

                        0 62

                        A normal log record should be

                        Otherwise something is wrong (eg task failed to complete)

                        3312010 614 RD00155D3611B0 Executing the task 251523

                        3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                        3312010 625 RD00155D3611B0 Executing the task 251553

                        3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                        3312010 644 RD00155D3611B0 Executing the task 251600

                        3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                        3312010 822 RD00155D3611B0 Executing the task 251774

                        3312010 950 RD00155D3611B0 Executing the task 251895

                        3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                        North Europe Data Center totally 34256 tasks processed

                        All 62 compute nodes lost tasks

                        and then came back in a group

                        This is an Update domain

                        ~30 mins

                        ~ 6 nodes in one group

                        35 Nodes experience blob

                        writing failure at same

                        time

                        West Europe Datacenter 30976 tasks are completed and job was killed

                        A reasonable guess the

                        Fault Domain is working

                        MODISAzure Computing Evapotranspiration (ET) in The Cloud

                        You never miss the water till the well has run dry Irish Proverb

                        ET = Water volume evapotranspired (m3 s-1 m-2)

                        Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                        λv = Latent heat of vaporization (Jg)

                        Rn = Net radiation (W m-2)

                        cp = Specific heat capacity of air (J kg-1 K-1)

                        ρa = dry air density (kg m-3)

                        δq = vapor pressure deficit (Pa)

                        ga = Conductivity of air (inverse of ra) (m s-1)

                        gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                        γ = Psychrometric constant (γ asymp 66 Pa K-1)

                        Estimating resistanceconductivity across a

                        catchment can be tricky

                        bull Lots of inputs big data reduction

                        bull Some of the inputs are not so simple

                        119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                        (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                        Penman-Monteith (1964)

                        Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                        NASA MODIS

                        imagery source

                        archives

                        5 TB (600K files)

                        FLUXNET curated

                        sensor dataset

                        (30GB 960 files)

                        FLUXNET curated

                        field dataset

                        2 KB (1 file)

                        NCEPNCAR

                        ~100MB

                        (4K files)

                        Vegetative

                        clumping

                        ~5MB (1file)

                        Climate

                        classification

                        ~1MB (1file)

                        20 US year = 1 global year

                        Data collection (map) stage

                        bull Downloads requested input tiles

                        from NASA ftp sites

                        bull Includes geospatial lookup for

                        non-sinusoidal tiles that will

                        contribute to a reprojected

                        sinusoidal tile

                        Reprojection (map) stage

                        bull Converts source tile(s) to

                        intermediate result sinusoidal tiles

                        bull Simple nearest neighbor or spline

                        algorithms

                        Derivation reduction stage

                        bull First stage visible to scientist

                        bull Computes ET in our initial use

                        Analysis reduction stage

                        bull Optional second stage visible to

                        scientist

                        bull Enables production of science

                        analysis artifacts such as maps

                        tables virtual sensors

                        Reduction 1

                        Queue

                        Source

                        Metadata

                        AzureMODIS

                        Service Web Role Portal

                        Request

                        Queue

                        Scientific

                        Results

                        Download

                        Data Collection Stage

                        Source Imagery Download Sites

                        Reprojection

                        Queue

                        Reduction 2

                        Queue

                        Download

                        Queue

                        Scientists

                        Science results

                        Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                        httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                        bull ModisAzure Service is the Web Role front door bull Receives all user requests

                        bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                        bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                        recoverable units of work

                        bull Execution status of all jobs and tasks persisted in Tables

                        ltPipelineStagegt

                        Request

                        hellip ltPipelineStagegtJobStatus

                        Persist ltPipelineStagegtJob Queue

                        MODISAzure Service

                        (Web Role)

                        Service Monitor

                        (Worker Role)

                        Parse amp Persist ltPipelineStagegtTaskStatus

                        hellip

                        Dispatch

                        ltPipelineStagegtTask Queue

                        All work actually done by a Worker Role

                        bull Sandboxes science or other executable

                        bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                        Service Monitor

                        (Worker Role)

                        Parse amp Persist ltPipelineStagegtTaskStatus

                        GenericWorker

                        (Worker Role)

                        hellip

                        hellip

                        Dispatch

                        ltPipelineStagegtTask Queue

                        hellip

                        ltInputgtData Storage

                        bull Dequeues tasks created by the Service Monitor

                        bull Retries failed tasks 3 times

                        bull Maintains all task status

                        Reprojection Request

                        hellip

                        Service Monitor

                        (Worker Role)

                        ReprojectionJobStatus Persist

                        Parse amp Persist ReprojectionTaskStatus

                        GenericWorker

                        (Worker Role)

                        hellip

                        Job Queue

                        hellip

                        Dispatch

                        Task Queue

                        Points to

                        hellip

                        ScanTimeList

                        SwathGranuleMeta

                        Reprojection Data

                        Storage

                        Each entity specifies a

                        single reprojection job

                        request

                        Each entity specifies a

                        single reprojection task (ie

                        a single tile)

                        Query this table to get

                        geo-metadata (eg

                        boundaries) for each swath

                        tile

                        Query this table to get the

                        list of satellite scan times

                        that cover a target tile

                        Swath Source

                        Data Storage

                        bull Computational costs driven by data scale and need to run reduction multiple times

                        bull Storage costs driven by data scale and 6 month project duration

                        bull Small with respect to the people costs even at graduate student rates

                        Reduction 1 Queue

                        Source Metadata

                        Request Queue

                        Scientific Results Download

                        Data Collection Stage

                        Source Imagery Download Sites

                        Reprojection Queue

                        Reduction 2 Queue

                        Download

                        Queue

                        Scientists

                        Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                        400-500 GB

                        60K files

                        10 MBsec

                        11 hours

                        lt10 workers

                        $50 upload

                        $450 storage

                        400 GB

                        45K files

                        3500 hours

                        20-100

                        workers

                        5-7 GB

                        55K files

                        1800 hours

                        20-100

                        workers

                        lt10 GB

                        ~1K files

                        1800 hours

                        20-100

                        workers

                        $420 cpu

                        $60 download

                        $216 cpu

                        $1 download

                        $6 storage

                        $216 cpu

                        $2 download

                        $9 storage

                        AzureMODIS

                        Service Web Role Portal

                        Total $1420

                        bull Clouds are the largest scale computer centers ever constructed and have the

                        potential to be important to both large and small scale science problems

                        bull Equally import they can increase participation in research providing needed

                        resources to userscommunities without ready access

                        bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                        support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                        latency applications do not perform optimally on clouds today

                        bull Provide valuable fault tolerance and scalability abstractions

                        bull Clouds as amplifier for familiar client tools and on premise compute

                        bull Clouds services to support research provide considerable leverage for both

                        individual researchers and entire communities of researchers

                        • Day 2 - Azure as a PaaS
                        • Day 2 - Applications

                          Key Components ndash Compute Worker Roles

                          bull Utility compute

                          bull Windows Server 2008

                          bull Background processing

                          bull Each role can define an amount of local storage

                          bull Protected space on the local drive considered volatile storage

                          bull May communicate with outside services

                          bull Azure Storage

                          bull SQL Azure

                          bull Other Web services

                          bull Can expose external and internal endpoints

                          Suggested Application Model Using queues for reliable messaging

                          Scalable Fault Tolerant Applications

                          Queues are the application glue bull Decouple parts of application easier to scale independently bull Resource allocation different priority queues and backend servers bull Mask faults in worker roles (reliable messaging)

                          Key Components ndash Compute VM Roles

                          bull Customized Role

                          bull You own the box

                          bull How it works

                          bull Download ldquoGuest OSrdquo to Server 2008 Hyper-V

                          bull Customize the OS as you need to

                          bull Upload the differences VHD

                          bull Azure runs your VM role using

                          bull Base OS

                          bull Differences VHD

                          Application Hosting

                          lsquoGrokkingrsquo the service model

                          bull Imagine white-boarding out your service architecture with boxes for nodes and arrows describing how they communicate

                          bull The service model is the same diagram written down in a declarative format

                          bull You give the Fabric the service model and the binaries that go with each of those nodes

                          bull The Fabric can provision deploy and manage that diagram for you

                          bull Find hardware home

                          bull Copy and launch your app binaries

                          bull Monitor your app and the hardware

                          bull In case of failure take action Perhaps even relocate your app

                          bull At all times the lsquodiagramrsquo stays whole

                          Automated Service Management Provide code + service model

                          bull Platform identifies and allocates resources deploys the service manages service health

                          bull Configuration is handled by two files

                          ServiceDefinitioncsdef

                          ServiceConfigurationcscfg

                          Service Definition

                          Service Configuration

                          GUI

                          Double click on Role Name in Azure Project

                          Deploying to the cloud

                          bull We can deploy from the portal or from script

                          bull VS builds two files

                          bull Encrypted package of your code

                          bull Your config file

                          bull You must create an Azure account then a service and then you deploy your code

                          bull Can take up to 20 minutes

                          bull (which is better than six months)

                          Service Management API

                          bullREST based API to manage your services

                          bullX509-certs for authentication

                          bullLets you create delete change upgrade swaphellip

                          bullLots of community and MSFT-built tools around the API

                          - Easy to roll your own

                          The Secret Sauce ndash The Fabric

                          The Fabric is the lsquobrainrsquo behind Windows Azure

                          1 Process service model

                          1 Determine resource requirements

                          2 Create role images

                          2 Allocate resources

                          3 Prepare nodes

                          1 Place role images on nodes

                          2 Configure settings

                          3 Start roles

                          4 Configure load balancers

                          5 Maintain service health

                          1 If role fails restart the role based on policy

                          2 If node fails migrate the role based on policy

                          Storage

                          Durable Storage At Massive Scale

                          Blob

                          - Massive files eg videos logs

                          Drive

                          - Use standard file system APIs

                          Tables - Non-relational but with few scale limits

                          - Use SQL Azure for relational data

                          Queues

                          - Facilitate loosely-coupled reliable systems

                          Blob Features and Functions

                          bull Store Large Objects (up to 1TB in size)

                          bull Can be served through Windows Azure CDN service

                          bull Standard REST Interface

                          bull PutBlob

                          bull Inserts a new blob overwrites the existing blob

                          bull GetBlob

                          bull Get whole blob or a specific range

                          bull DeleteBlob

                          bull CopyBlob

                          bull SnapshotBlob

                          bull LeaseBlob

                          Two Types of Blobs Under the Hood

                          bull Block Blob

                          bull Targeted at streaming workloads

                          bull Each blob consists of a sequence of blocks bull Each block is identified by a Block ID

                          bull Size limit 200GB per blob

                          bull Page Blob

                          bull Targeted at random readwrite workloads

                          bull Each blob consists of an array of pages bull Each page is identified by its offset

                          from the start of the blob

                          bull Size limit 1TB per blob

                          Windows Azure Drive

                          bull Provides a durable NTFS volume for Windows Azure applications to use

                          bull Use existing NTFS APIs to access a durable drive bull Durability and survival of data on application failover

                          bull Enables migrating existing NTFS applications to the cloud

                          bull A Windows Azure Drive is a Page Blob

                          bull Example mount Page Blob as X bull httpltaccountnamegtblobcorewindowsnetltcontainernamegtltblobnamegt

                          bull All writes to drive are made durable to the Page Blob bull Drive made durable through standard Page Blob replication

                          bull Drive persists even when not mounted as a Page Blob

                          Windows Azure Tables

                          bull Provides Structured Storage

                          bull Massively Scalable Tables bull Billions of entities (rows) and TBs of data

                          bull Can use thousands of servers as traffic grows

                          bull Highly Available amp Durable bull Data is replicated several times

                          bull Familiar and Easy to use API

                          bull WCF Data Services and OData bull NET classes and LINQ

                          bull REST ndash with any platform or language

                          Windows Azure Queues

                          bull Queue are performance efficient highly available and provide reliable message delivery

                          bull Simple asynchronous work dispatch

                          bull Programming semantics ensure that a message can be processed at least once

                          bull Access is provided via REST

                          Storage Partitioning

                          Understanding partitioning is key to understanding performance

                          bull Different for each data type (blobs entities queues) Every data object has a partition key

                          bull A partition can be served by a single server

                          bull System load balances partitions based on traffic pattern

                          bull Controls entity locality Partition key is unit of scale

                          bull Load balancing can take a few minutes to kick in

                          bull Can take a couple of seconds for partition to be available on a different server

                          System load balances

                          bull Use exponential backoff on ldquoServer Busyrdquo

                          bull Our system load balances to meet your traffic needs

                          bull Single partition limits have been reached Server Busy

                          Partition Keys In Each Abstraction

                          bull Entities w same PartitionKey value served from same partition Entities ndash TableName + PartitionKey

                          PartitionKey (CustomerId) RowKey (RowKind)

                          Name CreditCardNumber OrderTotal

                          1 Customer-John Smith John Smith xxxx-xxxx-xxxx-xxxx

                          1 Order ndash 1 $3512

                          2 Customer-Bill Johnson Bill Johnson xxxx-xxxx-xxxx-xxxx

                          2 Order ndash 3 $1000

                          bull Every blob and its snapshots are in a single partition Blobs ndash Container name + Blob name

                          bullAll messages for a single queue belong to the same partition Messages ndash Queue Name

                          Container Name Blob Name

                          image annarborbighousejpg

                          image foxboroughgillettejpg

                          video annarborbighousejpg

                          Queue Message

                          jobs Message1

                          jobs Message2

                          workflow Message1

                          Scalability Targets

                          Storage Account

                          bull Capacity ndash Up to 100 TBs

                          bull Transactions ndash Up to a few thousand requests per second

                          bull Bandwidth ndash Up to a few hundred megabytes per second

                          Single QueueTable Partition

                          bull Up to 500 transactions per second

                          To go above these numbers partition between multiple storage accounts and partitions

                          When limit is hit app will see lsquo503 server busyrsquo applications should implement exponential backoff

                          Single Blob Partition

                          bull Throughput up to 60 MBs

                          PartitionKey (Category)

                          RowKey (Title)

                          Timestamp ReleaseDate

                          Action Fast amp Furious hellip 2009

                          Action The Bourne Ultimatum hellip 2007

                          hellip hellip hellip hellip

                          Animation Open Season 2 hellip 2009

                          Animation The Ant Bully hellip 2006

                          PartitionKey (Category)

                          RowKey (Title)

                          Timestamp ReleaseDate

                          Comedy Office Space hellip 1999

                          hellip hellip hellip hellip

                          SciFi X-Men Origins Wolverine hellip 2009

                          hellip hellip hellip hellip

                          War Defiance hellip 2008

                          PartitionKey (Category)

                          RowKey (Title)

                          Timestamp ReleaseDate

                          Action Fast amp Furious hellip 2009

                          Action The Bourne Ultimatum hellip 2007

                          hellip hellip hellip hellip

                          Animation Open Season 2 hellip 2009

                          Animation The Ant Bully hellip 2006

                          hellip hellip hellip hellip

                          Comedy Office Space hellip 1999

                          hellip hellip hellip hellip

                          SciFi X-Men Origins Wolverine hellip 2009

                          hellip hellip hellip hellip

                          War Defiance hellip 2008

                          Partitions and Partition Ranges

                          Key Selection Things to Consider

                          bullDistribute load as much as possible bullHot partitions can be load balanced bullPartitionKey is critical for scalability

                          See httpwwwmicrosoftpdccom2009SVC09 and httpazurescopecloudappnet for more information

                          bull Avoid frequent large scans bull Parallelize queries bull Point queries are most efficient

                          bullTransactions across a single partition bullTransaction semantics amp Reduce round trips

                          Scalability

                          Query Efficiency amp Speed

                          Entity group transactions

                          Expect Continuation Tokens ndash Seriously

                          Maximum of 1000 rows in a response

                          At the end of partition range boundary

                          Maximum of 1000 rows in a response

                          At the end of partition range boundary

                          Maximum of 5 seconds to execute the query

                          Tables Recap bullEfficient for frequently used queries

                          bullSupports batch transactions

                          bullDistributes load

                          Select PartitionKey and RowKey that help scale

                          Avoid ldquoAppend onlyrdquo patterns

                          Always Handle continuation tokens

                          ldquoORrdquo predicates are not optimized

                          Implement back-off strategy for retries

                          bullDistribute by using a hash etc as prefix

                          bullExpect continuation tokens for range queries

                          bullExecute the queries that form the ldquoORrdquo predicates as separate queries

                          bullServer busy

                          bullLoad balance partitions to meet traffic needs

                          bullLoad on single partition has exceeded the limits

                          WCF Data Services

                          bullUse a new context for each logical operation

                          bullAddObjectAttachTo can throw exception if entity is already being tracked

                          bullPoint query throws an exception if resource does not exist Use IgnoreResourceNotFoundException

                          Queues Their Unique Role in Building Reliable Scalable Applications

                          bull Want roles that work closely together but are not bound together bull Tight coupling leads to brittleness

                          bull This can aid in scaling and performance

                          bull A queue can hold an unlimited number of messages bull Messages must be serializable as XML

                          bull Limited to 8KB in size

                          bull Commonly use the work ticket pattern

                          bull Why not simply use a table

                          Queue Terminology

                          Message Lifecycle

                          Queue

                          Msg 1

                          Msg 2

                          Msg 3

                          Msg 4

                          Worker Role

                          Worker Role

                          PutMessage

                          Web Role

                          GetMessage (Timeout) RemoveMessage

                          Msg 2 Msg 1

                          Worker Role

                          Msg 2

                          POST httpmyaccountqueuecorewindowsnetmyqueuemessages

                          HTTP11 200 OK Transfer-Encoding chunked Content-Type applicationxml Date Tue 09 Dec 2008 210430 GMT Server Nephos Queue Service Version 10 Microsoft-HTTPAPI20

                          ltxml version=10 encoding=utf-8gt ltQueueMessagesListgt ltQueueMessagegt ltMessageIdgt5974b586-0df3-4e2d-ad0c-18e3892bfca2ltMessageIdgt ltInsertionTimegtMon 22 Sep 2008 232920 GMTltInsertionTimegt ltExpirationTimegtMon 29 Sep 2008 232920 GMTltExpirationTimegt ltPopReceiptgtYzQ4Yzg1MDIGM0MDFiZDAwYzEwltPopReceiptgt ltTimeNextVisiblegtTue 23 Sep 2008 052920GMTltTimeNextVisiblegt ltMessageTextgtPHRlc3Q+dGdGVzdD4=ltMessageTextgt ltQueueMessagegt ltQueueMessagesListgt

                          DELETE httpmyaccountqueuecorewindowsnetmyqueuemessagesmessageidpopreceipt=YzQ4Yzg1MDIGM0MDFiZDAwYzEw

                          Truncated Exponential Back Off Polling

                          Consider a backoff polling approach Each empty poll

                          increases interval by 2x

                          A successful sets the interval back to 1

                          44

                          2 1

                          1 1

                          C1

                          C2

                          Removing Poison Messages

                          1 1

                          2 1

                          3 4 0

                          Producers Consumers

                          P2

                          P1

                          3 0

                          2 GetMessage(Q 30 s) msg 2

                          1 GetMessage(Q 30 s) msg 1

                          1 1

                          2 1

                          1 0

                          2 0

                          45

                          C1

                          C2

                          Removing Poison Messages

                          3 4 0

                          Producers Consumers

                          P2

                          P1

                          1 1

                          2 1

                          2 GetMessage(Q 30 s) msg 2 3 C2 consumed msg 2 4 DeleteMessage(Q msg 2) 7 GetMessage(Q 30 s) msg 1

                          1 GetMessage(Q 30 s) msg 1 5 C1 crashed

                          1 1

                          2 1

                          6 msg1 visible 30 s after Dequeue 3 0

                          1 2

                          1 1

                          1 2

                          46

                          C1

                          C2

                          Removing Poison Messages

                          3 4 0

                          Producers Consumers

                          P2

                          P1

                          1 2

                          2 Dequeue(Q 30 sec) msg 2 3 C2 consumed msg 2 4 Delete(Q msg 2) 7 Dequeue(Q 30 sec) msg 1 8 C2 crashed

                          1 Dequeue(Q 30 sec) msg 1 5 C1 crashed 10 C1 restarted 11 Dequeue(Q 30 sec) msg 1 12 DequeueCount gt 2 13 Delete (Q msg1) 1

                          2

                          6 msg1 visible 30s after Dequeue 9 msg1 visible 30s after Dequeue

                          3 0

                          1 3

                          1 2

                          1 3

                          Queues Recap

                          bullNo need to deal with failures Make message

                          processing idempotent

                          bullInvisible messages result in out of order Do not rely on order

                          bullEnforce threshold on messagersquos dequeue count Use Dequeue count to remove

                          poison messages

                          bullMessages gt 8KB

                          bullBatch messages

                          bullGarbage collect orphaned blobs

                          bullDynamically increasereduce workers

                          Use blob to store message data with

                          reference in message

                          Use message count to scale

                          bullNo need to deal with failures

                          bullInvisible messages result in out of order

                          bullEnforce threshold on messagersquos dequeue count

                          bullDynamically increasereduce workers

                          Windows Azure Storage Takeaways

                          Blobs

                          Drives

                          Tables

                          Queues

                          httpblogsmsdncomwindowsazurestorage

                          httpazurescopecloudappnet

                          49

                          A Quick Exercise

                          hellipThen letrsquos look at some code and some tools

                          50

                          Code ndash AccountInformationcs public class AccountInformation private static string storageKey = ldquotHiSiSnOtMyKeY private static string accountName = jjstore private static StorageCredentialsAccountAndKey credentials internal static StorageCredentialsAccountAndKey Credentials get if (credentials == null) credentials = new StorageCredentialsAccountAndKey(accountName storageKey) return credentials

                          51

                          Code ndash BlobHelpercs public class BlobHelper private static string defaultContainerName = school private CloudBlobClient client = null private CloudBlobContainer container = null private void InitContainer() if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudBlobClient() container = clientGetContainerReference(defaultContainerName) containerCreateIfNotExist() BlobContainerPermissions permissions = containerGetPermissions() permissionsPublicAccess = BlobContainerPublicAccessTypeContainer containerSetPermissions(permissions)

                          52

                          Code ndash BlobHelpercs

                          public void WriteFileToBlob(string filePath) if (client == null || container == null) InitContainer() FileInfo file = new FileInfo(filePath) CloudBlob blob = containerGetBlobReference(fileName) blobPropertiesContentType = GetContentType(fileExtension) blobUploadFile(fileFullName) Or if you want to write a string replace the last line with blobUploadText(someString) And make sure you set the content type to the appropriate MIME type (eg ldquotextplainrdquo)

                          53

                          Code ndash BlobHelpercs

                          public string GetBlobText(string blobName) if (client == null || container == null) InitContainer() CloudBlob blob = containerGetBlobReference(blobName) try return blobDownloadText() catch (Exception) The blob probably does not exist or there is no connection available return null

                          54

                          Application Code - Blobs private void SaveToCloudButton_Click(object sender RoutedEventArgs e) StringBuilder buff = new StringBuilder() buffAppendLine(LastNameFirstNameEmailBirthdayNativeLanguageFavoriteIceCreamYearsInPhDGraduated) foreach (AttendeeEntity attendee in attendees) buffAppendLine(attendeeToCsvString()) blobHelperWriteStringToBlob(SummerSchoolAttendeestxt buffToString())

                          The blob is now available at httpltAccountNamegtblobcorewindowsnetltContainerNamegtltBlobNamegt Or in this case httpjjstoreblobcorewindowsnetschoolSummerSchoolAttendeestxt

                          55

                          Code - TableEntities using MicrosoftWindowsAzureStorageClient public class AttendeeEntity TableServiceEntity public string FirstName get set public string LastName get set public string Email get set public DateTime Birthday get set public string FavoriteIceCream get set public int YearsInPhD get set public bool Graduated get set hellip

                          56

                          Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

                          57

                          Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

                          58

                          Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

                          59

                          Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

                          60

                          Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

                          Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

                          61

                          Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

                          62

                          Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

                          Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

                          63

                          Tools ndash Fiddler2

                          Best Practices

                          Picking the Right VM Size

                          bull Having the correct VM size can make a big difference in costs

                          bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                          bull If you scale better than linear across cores larger VMs could save you money

                          bull Pretty rare to see linear scaling across 8 cores

                          bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                          bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                          Using Your VM to the Maximum

                          Remember bull 1 role instance == 1 VM running Windows

                          bull 1 role instance = one specific task for your code

                          bull Yoursquore paying for the entire VM so why not use it

                          bull Common mistake ndash split up code into multiple roles each not using up CPU

                          bull Balance between using up CPU vs having free capacity in times of need

                          bull Multiple ways to use your CPU to the fullest

                          Exploiting Concurrency

                          bull Spin up additional processes each with a specific task or as a unit of concurrency

                          bull May not be ideal if number of active processes exceeds number of cores

                          bull Use multithreading aggressively

                          bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                          bull In NET 4 use the Task Parallel Library

                          bull Data parallelism

                          bull Task parallelism

                          Finding Good Code Neighbors

                          bull Typically code falls into one or more of these categories

                          bull Find code that is intensive with different resources to live together

                          bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                          Memory Intensive

                          CPU Intensive

                          Network IO Intensive

                          Storage IO Intensive

                          Scaling Appropriately

                          bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                          bull Spinning VMs up and down automatically is good at large scale

                          bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                          bull Being too aggressive in spinning down VMs can result in poor user experience

                          bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                          Performance Cost

                          Storage Costs

                          bull Understand an applicationrsquos storage profile and how storage billing works

                          bull Make service choices based on your app profile

                          bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                          bull Service choice can make a big cost difference based on your app profile

                          bull Caching and compressing They help a lot with storage costs

                          Saving Bandwidth Costs

                          Bandwidth costs are a huge part of any popular web apprsquos billing profile

                          Sending fewer things over the wire often means getting fewer things from storage

                          Saving bandwidth costs often lead to savings in other places

                          Sending fewer things means your VM has time to do other tasks

                          All of these tips have the side benefit of improving your web apprsquos performance and user experience

                          Compressing Content

                          1 Gzip all output content

                          bull All modern browsers can decompress on the fly

                          bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                          2Tradeoff compute costs for storage size

                          3Minimize image sizes

                          bull Use Portable Network Graphics (PNGs)

                          bull Crush your PNGs

                          bull Strip needless metadata

                          bull Make all PNGs palette PNGs

                          Uncompressed

                          Content

                          Compressed

                          Content

                          Gzip

                          Minify JavaScript

                          Minify CCS

                          Minify Images

                          Best Practices Summary

                          Doing lsquolessrsquo is the key to saving costs

                          Measure everything

                          Know your application profile in and out

                          Research Examples in the Cloud

                          hellipon another set of slides

                          Map Reduce on Azure

                          bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                          bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                          76

                          Project Daytona - Map Reduce on Azure

                          httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                          77

                          Questions and Discussionhellip

                          Thank you for hosting me at the Summer School

                          BLAST (Basic Local Alignment Search Tool)

                          bull The most important software in bioinformatics

                          bull Identify similarity between bio-sequences

                          Computationally intensive

                          bull Large number of pairwise alignment operations

                          bull A BLAST running can take 700 ~ 1000 CPU hours

                          bull Sequence databases growing exponentially

                          bull GenBank doubled in size in about 15 months

                          It is easy to parallelize BLAST

                          bull Segment the input

                          bull Segment processing (querying) is pleasingly parallel

                          bull Segment the database (eg mpiBLAST)

                          bull Needs special result reduction processing

                          Large volume data

                          bull A normal Blast database can be as large as 10GB

                          bull 100 nodes means the peak storage bandwidth could reach to 1TB

                          bull The output of BLAST is usually 10-100x larger than the input

                          bull Parallel BLAST engine on Azure

                          bull Query-segmentation data-parallel pattern bull split the input sequences

                          bull query partitions in parallel

                          bull merge results together when done

                          bull Follows the general suggested application model bull Web Role + Queue + Worker

                          bull With three special considerations bull Batch job management

                          bull Task parallelism on an elastic Cloud

                          Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                          on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                          A simple SplitJoin pattern

                          Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                          bull 1248 for small middle large and extra large instance size

                          Task granularity bull Large partition load imbalance

                          bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                          bull Data transferring overhead

                          Best Practice test runs to profiling and set size to mitigate the overhead

                          Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                          bull too small repeated computation

                          bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                          bull Estimate the value based on the number of pair-bases in the partition and test-runs

                          bull Watch out for the 2-hour maximum limitation

                          BLAST task

                          Splitting task

                          BLAST task

                          BLAST task

                          BLAST task

                          hellip

                          Merging Task

                          Task size vs Performance

                          bull Benefit of the warm cache effect

                          bull 100 sequences per partition is the best choice

                          Instance size vs Performance

                          bull Super-linear speedup with larger size worker instances

                          bull Primarily due to the memory capability

                          Task SizeInstance Size vs Cost

                          bull Extra-large instance generated the best and the most economical throughput

                          bull Fully utilize the resource

                          Web

                          Portal

                          Web

                          Service

                          Job registration

                          Job Scheduler

                          Worker

                          Worker

                          Worker

                          Global

                          dispatch

                          queue

                          Web Role

                          Azure Table

                          Job Management Role

                          Azure Blob

                          Database

                          updating Role

                          hellip

                          Scaling Engine

                          Blast databases

                          temporary data etc)

                          Job Registry NCBI databases

                          BLAST task

                          Splitting task

                          BLAST task

                          BLAST task

                          BLAST task

                          hellip

                          Merging Task

                          ASPNET program hosted by a web role instance bull Submit jobs

                          bull Track jobrsquos status and logs

                          AuthenticationAuthorization based on Live ID

                          The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                          Web Portal

                          Web Service

                          Job registration

                          Job Scheduler

                          Job Portal

                          Scaling Engine

                          Job Registry

                          R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                          Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                          bull Against ~5000 proteins from another strain completed in less than 30 sec

                          AzureBLAST significantly saved computing timehellip

                          Discovering Homologs bull Discover the interrelationships of known protein sequences

                          ldquoAll against Allrdquo query bull The database is also the input query

                          bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                          bull Theoretically 100 billion sequence comparisons

                          Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                          bull Would require 3216731 minutes (61 years) on one desktop

                          One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                          bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                          bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                          bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                          bull Each segment consists of smaller partitions

                          bull When load imbalances redistribute the load manually

                          5

                          0

                          62 6

                          2 6

                          2 6

                          2 6

                          2 5

                          0 62

                          bull Total size of the output result is ~230GB

                          bull The number of total hits is 1764579487

                          bull Started at March 25th the last task completed on April 8th (10 days compute)

                          bull But based our estimates real working instance time should be 6~8 day

                          bull Look into log data to analyze what took placehellip

                          5

                          0

                          62 6

                          2 6

                          2 6

                          2 6

                          2 5

                          0 62

                          A normal log record should be

                          Otherwise something is wrong (eg task failed to complete)

                          3312010 614 RD00155D3611B0 Executing the task 251523

                          3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                          3312010 625 RD00155D3611B0 Executing the task 251553

                          3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                          3312010 644 RD00155D3611B0 Executing the task 251600

                          3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                          3312010 822 RD00155D3611B0 Executing the task 251774

                          3312010 950 RD00155D3611B0 Executing the task 251895

                          3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                          North Europe Data Center totally 34256 tasks processed

                          All 62 compute nodes lost tasks

                          and then came back in a group

                          This is an Update domain

                          ~30 mins

                          ~ 6 nodes in one group

                          35 Nodes experience blob

                          writing failure at same

                          time

                          West Europe Datacenter 30976 tasks are completed and job was killed

                          A reasonable guess the

                          Fault Domain is working

                          MODISAzure Computing Evapotranspiration (ET) in The Cloud

                          You never miss the water till the well has run dry Irish Proverb

                          ET = Water volume evapotranspired (m3 s-1 m-2)

                          Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                          λv = Latent heat of vaporization (Jg)

                          Rn = Net radiation (W m-2)

                          cp = Specific heat capacity of air (J kg-1 K-1)

                          ρa = dry air density (kg m-3)

                          δq = vapor pressure deficit (Pa)

                          ga = Conductivity of air (inverse of ra) (m s-1)

                          gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                          γ = Psychrometric constant (γ asymp 66 Pa K-1)

                          Estimating resistanceconductivity across a

                          catchment can be tricky

                          bull Lots of inputs big data reduction

                          bull Some of the inputs are not so simple

                          119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                          (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                          Penman-Monteith (1964)

                          Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                          NASA MODIS

                          imagery source

                          archives

                          5 TB (600K files)

                          FLUXNET curated

                          sensor dataset

                          (30GB 960 files)

                          FLUXNET curated

                          field dataset

                          2 KB (1 file)

                          NCEPNCAR

                          ~100MB

                          (4K files)

                          Vegetative

                          clumping

                          ~5MB (1file)

                          Climate

                          classification

                          ~1MB (1file)

                          20 US year = 1 global year

                          Data collection (map) stage

                          bull Downloads requested input tiles

                          from NASA ftp sites

                          bull Includes geospatial lookup for

                          non-sinusoidal tiles that will

                          contribute to a reprojected

                          sinusoidal tile

                          Reprojection (map) stage

                          bull Converts source tile(s) to

                          intermediate result sinusoidal tiles

                          bull Simple nearest neighbor or spline

                          algorithms

                          Derivation reduction stage

                          bull First stage visible to scientist

                          bull Computes ET in our initial use

                          Analysis reduction stage

                          bull Optional second stage visible to

                          scientist

                          bull Enables production of science

                          analysis artifacts such as maps

                          tables virtual sensors

                          Reduction 1

                          Queue

                          Source

                          Metadata

                          AzureMODIS

                          Service Web Role Portal

                          Request

                          Queue

                          Scientific

                          Results

                          Download

                          Data Collection Stage

                          Source Imagery Download Sites

                          Reprojection

                          Queue

                          Reduction 2

                          Queue

                          Download

                          Queue

                          Scientists

                          Science results

                          Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                          httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                          bull ModisAzure Service is the Web Role front door bull Receives all user requests

                          bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                          bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                          recoverable units of work

                          bull Execution status of all jobs and tasks persisted in Tables

                          ltPipelineStagegt

                          Request

                          hellip ltPipelineStagegtJobStatus

                          Persist ltPipelineStagegtJob Queue

                          MODISAzure Service

                          (Web Role)

                          Service Monitor

                          (Worker Role)

                          Parse amp Persist ltPipelineStagegtTaskStatus

                          hellip

                          Dispatch

                          ltPipelineStagegtTask Queue

                          All work actually done by a Worker Role

                          bull Sandboxes science or other executable

                          bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                          Service Monitor

                          (Worker Role)

                          Parse amp Persist ltPipelineStagegtTaskStatus

                          GenericWorker

                          (Worker Role)

                          hellip

                          hellip

                          Dispatch

                          ltPipelineStagegtTask Queue

                          hellip

                          ltInputgtData Storage

                          bull Dequeues tasks created by the Service Monitor

                          bull Retries failed tasks 3 times

                          bull Maintains all task status

                          Reprojection Request

                          hellip

                          Service Monitor

                          (Worker Role)

                          ReprojectionJobStatus Persist

                          Parse amp Persist ReprojectionTaskStatus

                          GenericWorker

                          (Worker Role)

                          hellip

                          Job Queue

                          hellip

                          Dispatch

                          Task Queue

                          Points to

                          hellip

                          ScanTimeList

                          SwathGranuleMeta

                          Reprojection Data

                          Storage

                          Each entity specifies a

                          single reprojection job

                          request

                          Each entity specifies a

                          single reprojection task (ie

                          a single tile)

                          Query this table to get

                          geo-metadata (eg

                          boundaries) for each swath

                          tile

                          Query this table to get the

                          list of satellite scan times

                          that cover a target tile

                          Swath Source

                          Data Storage

                          bull Computational costs driven by data scale and need to run reduction multiple times

                          bull Storage costs driven by data scale and 6 month project duration

                          bull Small with respect to the people costs even at graduate student rates

                          Reduction 1 Queue

                          Source Metadata

                          Request Queue

                          Scientific Results Download

                          Data Collection Stage

                          Source Imagery Download Sites

                          Reprojection Queue

                          Reduction 2 Queue

                          Download

                          Queue

                          Scientists

                          Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                          400-500 GB

                          60K files

                          10 MBsec

                          11 hours

                          lt10 workers

                          $50 upload

                          $450 storage

                          400 GB

                          45K files

                          3500 hours

                          20-100

                          workers

                          5-7 GB

                          55K files

                          1800 hours

                          20-100

                          workers

                          lt10 GB

                          ~1K files

                          1800 hours

                          20-100

                          workers

                          $420 cpu

                          $60 download

                          $216 cpu

                          $1 download

                          $6 storage

                          $216 cpu

                          $2 download

                          $9 storage

                          AzureMODIS

                          Service Web Role Portal

                          Total $1420

                          bull Clouds are the largest scale computer centers ever constructed and have the

                          potential to be important to both large and small scale science problems

                          bull Equally import they can increase participation in research providing needed

                          resources to userscommunities without ready access

                          bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                          support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                          latency applications do not perform optimally on clouds today

                          bull Provide valuable fault tolerance and scalability abstractions

                          bull Clouds as amplifier for familiar client tools and on premise compute

                          bull Clouds services to support research provide considerable leverage for both

                          individual researchers and entire communities of researchers

                          • Day 2 - Azure as a PaaS
                          • Day 2 - Applications

                            Suggested Application Model Using queues for reliable messaging

                            Scalable Fault Tolerant Applications

                            Queues are the application glue bull Decouple parts of application easier to scale independently bull Resource allocation different priority queues and backend servers bull Mask faults in worker roles (reliable messaging)

                            Key Components ndash Compute VM Roles

                            bull Customized Role

                            bull You own the box

                            bull How it works

                            bull Download ldquoGuest OSrdquo to Server 2008 Hyper-V

                            bull Customize the OS as you need to

                            bull Upload the differences VHD

                            bull Azure runs your VM role using

                            bull Base OS

                            bull Differences VHD

                            Application Hosting

                            lsquoGrokkingrsquo the service model

                            bull Imagine white-boarding out your service architecture with boxes for nodes and arrows describing how they communicate

                            bull The service model is the same diagram written down in a declarative format

                            bull You give the Fabric the service model and the binaries that go with each of those nodes

                            bull The Fabric can provision deploy and manage that diagram for you

                            bull Find hardware home

                            bull Copy and launch your app binaries

                            bull Monitor your app and the hardware

                            bull In case of failure take action Perhaps even relocate your app

                            bull At all times the lsquodiagramrsquo stays whole

                            Automated Service Management Provide code + service model

                            bull Platform identifies and allocates resources deploys the service manages service health

                            bull Configuration is handled by two files

                            ServiceDefinitioncsdef

                            ServiceConfigurationcscfg

                            Service Definition

                            Service Configuration

                            GUI

                            Double click on Role Name in Azure Project

                            Deploying to the cloud

                            bull We can deploy from the portal or from script

                            bull VS builds two files

                            bull Encrypted package of your code

                            bull Your config file

                            bull You must create an Azure account then a service and then you deploy your code

                            bull Can take up to 20 minutes

                            bull (which is better than six months)

                            Service Management API

                            bullREST based API to manage your services

                            bullX509-certs for authentication

                            bullLets you create delete change upgrade swaphellip

                            bullLots of community and MSFT-built tools around the API

                            - Easy to roll your own

                            The Secret Sauce ndash The Fabric

                            The Fabric is the lsquobrainrsquo behind Windows Azure

                            1 Process service model

                            1 Determine resource requirements

                            2 Create role images

                            2 Allocate resources

                            3 Prepare nodes

                            1 Place role images on nodes

                            2 Configure settings

                            3 Start roles

                            4 Configure load balancers

                            5 Maintain service health

                            1 If role fails restart the role based on policy

                            2 If node fails migrate the role based on policy

                            Storage

                            Durable Storage At Massive Scale

                            Blob

                            - Massive files eg videos logs

                            Drive

                            - Use standard file system APIs

                            Tables - Non-relational but with few scale limits

                            - Use SQL Azure for relational data

                            Queues

                            - Facilitate loosely-coupled reliable systems

                            Blob Features and Functions

                            bull Store Large Objects (up to 1TB in size)

                            bull Can be served through Windows Azure CDN service

                            bull Standard REST Interface

                            bull PutBlob

                            bull Inserts a new blob overwrites the existing blob

                            bull GetBlob

                            bull Get whole blob or a specific range

                            bull DeleteBlob

                            bull CopyBlob

                            bull SnapshotBlob

                            bull LeaseBlob

                            Two Types of Blobs Under the Hood

                            bull Block Blob

                            bull Targeted at streaming workloads

                            bull Each blob consists of a sequence of blocks bull Each block is identified by a Block ID

                            bull Size limit 200GB per blob

                            bull Page Blob

                            bull Targeted at random readwrite workloads

                            bull Each blob consists of an array of pages bull Each page is identified by its offset

                            from the start of the blob

                            bull Size limit 1TB per blob

                            Windows Azure Drive

                            bull Provides a durable NTFS volume for Windows Azure applications to use

                            bull Use existing NTFS APIs to access a durable drive bull Durability and survival of data on application failover

                            bull Enables migrating existing NTFS applications to the cloud

                            bull A Windows Azure Drive is a Page Blob

                            bull Example mount Page Blob as X bull httpltaccountnamegtblobcorewindowsnetltcontainernamegtltblobnamegt

                            bull All writes to drive are made durable to the Page Blob bull Drive made durable through standard Page Blob replication

                            bull Drive persists even when not mounted as a Page Blob

                            Windows Azure Tables

                            bull Provides Structured Storage

                            bull Massively Scalable Tables bull Billions of entities (rows) and TBs of data

                            bull Can use thousands of servers as traffic grows

                            bull Highly Available amp Durable bull Data is replicated several times

                            bull Familiar and Easy to use API

                            bull WCF Data Services and OData bull NET classes and LINQ

                            bull REST ndash with any platform or language

                            Windows Azure Queues

                            bull Queue are performance efficient highly available and provide reliable message delivery

                            bull Simple asynchronous work dispatch

                            bull Programming semantics ensure that a message can be processed at least once

                            bull Access is provided via REST

                            Storage Partitioning

                            Understanding partitioning is key to understanding performance

                            bull Different for each data type (blobs entities queues) Every data object has a partition key

                            bull A partition can be served by a single server

                            bull System load balances partitions based on traffic pattern

                            bull Controls entity locality Partition key is unit of scale

                            bull Load balancing can take a few minutes to kick in

                            bull Can take a couple of seconds for partition to be available on a different server

                            System load balances

                            bull Use exponential backoff on ldquoServer Busyrdquo

                            bull Our system load balances to meet your traffic needs

                            bull Single partition limits have been reached Server Busy

                            Partition Keys In Each Abstraction

                            bull Entities w same PartitionKey value served from same partition Entities ndash TableName + PartitionKey

                            PartitionKey (CustomerId) RowKey (RowKind)

                            Name CreditCardNumber OrderTotal

                            1 Customer-John Smith John Smith xxxx-xxxx-xxxx-xxxx

                            1 Order ndash 1 $3512

                            2 Customer-Bill Johnson Bill Johnson xxxx-xxxx-xxxx-xxxx

                            2 Order ndash 3 $1000

                            bull Every blob and its snapshots are in a single partition Blobs ndash Container name + Blob name

                            bullAll messages for a single queue belong to the same partition Messages ndash Queue Name

                            Container Name Blob Name

                            image annarborbighousejpg

                            image foxboroughgillettejpg

                            video annarborbighousejpg

                            Queue Message

                            jobs Message1

                            jobs Message2

                            workflow Message1

                            Scalability Targets

                            Storage Account

                            bull Capacity ndash Up to 100 TBs

                            bull Transactions ndash Up to a few thousand requests per second

                            bull Bandwidth ndash Up to a few hundred megabytes per second

                            Single QueueTable Partition

                            bull Up to 500 transactions per second

                            To go above these numbers partition between multiple storage accounts and partitions

                            When limit is hit app will see lsquo503 server busyrsquo applications should implement exponential backoff

                            Single Blob Partition

                            bull Throughput up to 60 MBs

                            PartitionKey (Category)

                            RowKey (Title)

                            Timestamp ReleaseDate

                            Action Fast amp Furious hellip 2009

                            Action The Bourne Ultimatum hellip 2007

                            hellip hellip hellip hellip

                            Animation Open Season 2 hellip 2009

                            Animation The Ant Bully hellip 2006

                            PartitionKey (Category)

                            RowKey (Title)

                            Timestamp ReleaseDate

                            Comedy Office Space hellip 1999

                            hellip hellip hellip hellip

                            SciFi X-Men Origins Wolverine hellip 2009

                            hellip hellip hellip hellip

                            War Defiance hellip 2008

                            PartitionKey (Category)

                            RowKey (Title)

                            Timestamp ReleaseDate

                            Action Fast amp Furious hellip 2009

                            Action The Bourne Ultimatum hellip 2007

                            hellip hellip hellip hellip

                            Animation Open Season 2 hellip 2009

                            Animation The Ant Bully hellip 2006

                            hellip hellip hellip hellip

                            Comedy Office Space hellip 1999

                            hellip hellip hellip hellip

                            SciFi X-Men Origins Wolverine hellip 2009

                            hellip hellip hellip hellip

                            War Defiance hellip 2008

                            Partitions and Partition Ranges

                            Key Selection Things to Consider

                            bullDistribute load as much as possible bullHot partitions can be load balanced bullPartitionKey is critical for scalability

                            See httpwwwmicrosoftpdccom2009SVC09 and httpazurescopecloudappnet for more information

                            bull Avoid frequent large scans bull Parallelize queries bull Point queries are most efficient

                            bullTransactions across a single partition bullTransaction semantics amp Reduce round trips

                            Scalability

                            Query Efficiency amp Speed

                            Entity group transactions

                            Expect Continuation Tokens ndash Seriously

                            Maximum of 1000 rows in a response

                            At the end of partition range boundary

                            Maximum of 1000 rows in a response

                            At the end of partition range boundary

                            Maximum of 5 seconds to execute the query

                            Tables Recap bullEfficient for frequently used queries

                            bullSupports batch transactions

                            bullDistributes load

                            Select PartitionKey and RowKey that help scale

                            Avoid ldquoAppend onlyrdquo patterns

                            Always Handle continuation tokens

                            ldquoORrdquo predicates are not optimized

                            Implement back-off strategy for retries

                            bullDistribute by using a hash etc as prefix

                            bullExpect continuation tokens for range queries

                            bullExecute the queries that form the ldquoORrdquo predicates as separate queries

                            bullServer busy

                            bullLoad balance partitions to meet traffic needs

                            bullLoad on single partition has exceeded the limits

                            WCF Data Services

                            bullUse a new context for each logical operation

                            bullAddObjectAttachTo can throw exception if entity is already being tracked

                            bullPoint query throws an exception if resource does not exist Use IgnoreResourceNotFoundException

                            Queues Their Unique Role in Building Reliable Scalable Applications

                            bull Want roles that work closely together but are not bound together bull Tight coupling leads to brittleness

                            bull This can aid in scaling and performance

                            bull A queue can hold an unlimited number of messages bull Messages must be serializable as XML

                            bull Limited to 8KB in size

                            bull Commonly use the work ticket pattern

                            bull Why not simply use a table

                            Queue Terminology

                            Message Lifecycle

                            Queue

                            Msg 1

                            Msg 2

                            Msg 3

                            Msg 4

                            Worker Role

                            Worker Role

                            PutMessage

                            Web Role

                            GetMessage (Timeout) RemoveMessage

                            Msg 2 Msg 1

                            Worker Role

                            Msg 2

                            POST httpmyaccountqueuecorewindowsnetmyqueuemessages

                            HTTP11 200 OK Transfer-Encoding chunked Content-Type applicationxml Date Tue 09 Dec 2008 210430 GMT Server Nephos Queue Service Version 10 Microsoft-HTTPAPI20

                            ltxml version=10 encoding=utf-8gt ltQueueMessagesListgt ltQueueMessagegt ltMessageIdgt5974b586-0df3-4e2d-ad0c-18e3892bfca2ltMessageIdgt ltInsertionTimegtMon 22 Sep 2008 232920 GMTltInsertionTimegt ltExpirationTimegtMon 29 Sep 2008 232920 GMTltExpirationTimegt ltPopReceiptgtYzQ4Yzg1MDIGM0MDFiZDAwYzEwltPopReceiptgt ltTimeNextVisiblegtTue 23 Sep 2008 052920GMTltTimeNextVisiblegt ltMessageTextgtPHRlc3Q+dGdGVzdD4=ltMessageTextgt ltQueueMessagegt ltQueueMessagesListgt

                            DELETE httpmyaccountqueuecorewindowsnetmyqueuemessagesmessageidpopreceipt=YzQ4Yzg1MDIGM0MDFiZDAwYzEw

                            Truncated Exponential Back Off Polling

                            Consider a backoff polling approach Each empty poll

                            increases interval by 2x

                            A successful sets the interval back to 1

                            44

                            2 1

                            1 1

                            C1

                            C2

                            Removing Poison Messages

                            1 1

                            2 1

                            3 4 0

                            Producers Consumers

                            P2

                            P1

                            3 0

                            2 GetMessage(Q 30 s) msg 2

                            1 GetMessage(Q 30 s) msg 1

                            1 1

                            2 1

                            1 0

                            2 0

                            45

                            C1

                            C2

                            Removing Poison Messages

                            3 4 0

                            Producers Consumers

                            P2

                            P1

                            1 1

                            2 1

                            2 GetMessage(Q 30 s) msg 2 3 C2 consumed msg 2 4 DeleteMessage(Q msg 2) 7 GetMessage(Q 30 s) msg 1

                            1 GetMessage(Q 30 s) msg 1 5 C1 crashed

                            1 1

                            2 1

                            6 msg1 visible 30 s after Dequeue 3 0

                            1 2

                            1 1

                            1 2

                            46

                            C1

                            C2

                            Removing Poison Messages

                            3 4 0

                            Producers Consumers

                            P2

                            P1

                            1 2

                            2 Dequeue(Q 30 sec) msg 2 3 C2 consumed msg 2 4 Delete(Q msg 2) 7 Dequeue(Q 30 sec) msg 1 8 C2 crashed

                            1 Dequeue(Q 30 sec) msg 1 5 C1 crashed 10 C1 restarted 11 Dequeue(Q 30 sec) msg 1 12 DequeueCount gt 2 13 Delete (Q msg1) 1

                            2

                            6 msg1 visible 30s after Dequeue 9 msg1 visible 30s after Dequeue

                            3 0

                            1 3

                            1 2

                            1 3

                            Queues Recap

                            bullNo need to deal with failures Make message

                            processing idempotent

                            bullInvisible messages result in out of order Do not rely on order

                            bullEnforce threshold on messagersquos dequeue count Use Dequeue count to remove

                            poison messages

                            bullMessages gt 8KB

                            bullBatch messages

                            bullGarbage collect orphaned blobs

                            bullDynamically increasereduce workers

                            Use blob to store message data with

                            reference in message

                            Use message count to scale

                            bullNo need to deal with failures

                            bullInvisible messages result in out of order

                            bullEnforce threshold on messagersquos dequeue count

                            bullDynamically increasereduce workers

                            Windows Azure Storage Takeaways

                            Blobs

                            Drives

                            Tables

                            Queues

                            httpblogsmsdncomwindowsazurestorage

                            httpazurescopecloudappnet

                            49

                            A Quick Exercise

                            hellipThen letrsquos look at some code and some tools

                            50

                            Code ndash AccountInformationcs public class AccountInformation private static string storageKey = ldquotHiSiSnOtMyKeY private static string accountName = jjstore private static StorageCredentialsAccountAndKey credentials internal static StorageCredentialsAccountAndKey Credentials get if (credentials == null) credentials = new StorageCredentialsAccountAndKey(accountName storageKey) return credentials

                            51

                            Code ndash BlobHelpercs public class BlobHelper private static string defaultContainerName = school private CloudBlobClient client = null private CloudBlobContainer container = null private void InitContainer() if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudBlobClient() container = clientGetContainerReference(defaultContainerName) containerCreateIfNotExist() BlobContainerPermissions permissions = containerGetPermissions() permissionsPublicAccess = BlobContainerPublicAccessTypeContainer containerSetPermissions(permissions)

                            52

                            Code ndash BlobHelpercs

                            public void WriteFileToBlob(string filePath) if (client == null || container == null) InitContainer() FileInfo file = new FileInfo(filePath) CloudBlob blob = containerGetBlobReference(fileName) blobPropertiesContentType = GetContentType(fileExtension) blobUploadFile(fileFullName) Or if you want to write a string replace the last line with blobUploadText(someString) And make sure you set the content type to the appropriate MIME type (eg ldquotextplainrdquo)

                            53

                            Code ndash BlobHelpercs

                            public string GetBlobText(string blobName) if (client == null || container == null) InitContainer() CloudBlob blob = containerGetBlobReference(blobName) try return blobDownloadText() catch (Exception) The blob probably does not exist or there is no connection available return null

                            54

                            Application Code - Blobs private void SaveToCloudButton_Click(object sender RoutedEventArgs e) StringBuilder buff = new StringBuilder() buffAppendLine(LastNameFirstNameEmailBirthdayNativeLanguageFavoriteIceCreamYearsInPhDGraduated) foreach (AttendeeEntity attendee in attendees) buffAppendLine(attendeeToCsvString()) blobHelperWriteStringToBlob(SummerSchoolAttendeestxt buffToString())

                            The blob is now available at httpltAccountNamegtblobcorewindowsnetltContainerNamegtltBlobNamegt Or in this case httpjjstoreblobcorewindowsnetschoolSummerSchoolAttendeestxt

                            55

                            Code - TableEntities using MicrosoftWindowsAzureStorageClient public class AttendeeEntity TableServiceEntity public string FirstName get set public string LastName get set public string Email get set public DateTime Birthday get set public string FavoriteIceCream get set public int YearsInPhD get set public bool Graduated get set hellip

                            56

                            Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

                            57

                            Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

                            58

                            Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

                            59

                            Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

                            60

                            Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

                            Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

                            61

                            Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

                            62

                            Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

                            Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

                            63

                            Tools ndash Fiddler2

                            Best Practices

                            Picking the Right VM Size

                            bull Having the correct VM size can make a big difference in costs

                            bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                            bull If you scale better than linear across cores larger VMs could save you money

                            bull Pretty rare to see linear scaling across 8 cores

                            bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                            bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                            Using Your VM to the Maximum

                            Remember bull 1 role instance == 1 VM running Windows

                            bull 1 role instance = one specific task for your code

                            bull Yoursquore paying for the entire VM so why not use it

                            bull Common mistake ndash split up code into multiple roles each not using up CPU

                            bull Balance between using up CPU vs having free capacity in times of need

                            bull Multiple ways to use your CPU to the fullest

                            Exploiting Concurrency

                            bull Spin up additional processes each with a specific task or as a unit of concurrency

                            bull May not be ideal if number of active processes exceeds number of cores

                            bull Use multithreading aggressively

                            bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                            bull In NET 4 use the Task Parallel Library

                            bull Data parallelism

                            bull Task parallelism

                            Finding Good Code Neighbors

                            bull Typically code falls into one or more of these categories

                            bull Find code that is intensive with different resources to live together

                            bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                            Memory Intensive

                            CPU Intensive

                            Network IO Intensive

                            Storage IO Intensive

                            Scaling Appropriately

                            bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                            bull Spinning VMs up and down automatically is good at large scale

                            bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                            bull Being too aggressive in spinning down VMs can result in poor user experience

                            bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                            Performance Cost

                            Storage Costs

                            bull Understand an applicationrsquos storage profile and how storage billing works

                            bull Make service choices based on your app profile

                            bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                            bull Service choice can make a big cost difference based on your app profile

                            bull Caching and compressing They help a lot with storage costs

                            Saving Bandwidth Costs

                            Bandwidth costs are a huge part of any popular web apprsquos billing profile

                            Sending fewer things over the wire often means getting fewer things from storage

                            Saving bandwidth costs often lead to savings in other places

                            Sending fewer things means your VM has time to do other tasks

                            All of these tips have the side benefit of improving your web apprsquos performance and user experience

                            Compressing Content

                            1 Gzip all output content

                            bull All modern browsers can decompress on the fly

                            bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                            2Tradeoff compute costs for storage size

                            3Minimize image sizes

                            bull Use Portable Network Graphics (PNGs)

                            bull Crush your PNGs

                            bull Strip needless metadata

                            bull Make all PNGs palette PNGs

                            Uncompressed

                            Content

                            Compressed

                            Content

                            Gzip

                            Minify JavaScript

                            Minify CCS

                            Minify Images

                            Best Practices Summary

                            Doing lsquolessrsquo is the key to saving costs

                            Measure everything

                            Know your application profile in and out

                            Research Examples in the Cloud

                            hellipon another set of slides

                            Map Reduce on Azure

                            bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                            bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                            76

                            Project Daytona - Map Reduce on Azure

                            httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                            77

                            Questions and Discussionhellip

                            Thank you for hosting me at the Summer School

                            BLAST (Basic Local Alignment Search Tool)

                            bull The most important software in bioinformatics

                            bull Identify similarity between bio-sequences

                            Computationally intensive

                            bull Large number of pairwise alignment operations

                            bull A BLAST running can take 700 ~ 1000 CPU hours

                            bull Sequence databases growing exponentially

                            bull GenBank doubled in size in about 15 months

                            It is easy to parallelize BLAST

                            bull Segment the input

                            bull Segment processing (querying) is pleasingly parallel

                            bull Segment the database (eg mpiBLAST)

                            bull Needs special result reduction processing

                            Large volume data

                            bull A normal Blast database can be as large as 10GB

                            bull 100 nodes means the peak storage bandwidth could reach to 1TB

                            bull The output of BLAST is usually 10-100x larger than the input

                            bull Parallel BLAST engine on Azure

                            bull Query-segmentation data-parallel pattern bull split the input sequences

                            bull query partitions in parallel

                            bull merge results together when done

                            bull Follows the general suggested application model bull Web Role + Queue + Worker

                            bull With three special considerations bull Batch job management

                            bull Task parallelism on an elastic Cloud

                            Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                            on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                            A simple SplitJoin pattern

                            Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                            bull 1248 for small middle large and extra large instance size

                            Task granularity bull Large partition load imbalance

                            bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                            bull Data transferring overhead

                            Best Practice test runs to profiling and set size to mitigate the overhead

                            Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                            bull too small repeated computation

                            bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                            bull Estimate the value based on the number of pair-bases in the partition and test-runs

                            bull Watch out for the 2-hour maximum limitation

                            BLAST task

                            Splitting task

                            BLAST task

                            BLAST task

                            BLAST task

                            hellip

                            Merging Task

                            Task size vs Performance

                            bull Benefit of the warm cache effect

                            bull 100 sequences per partition is the best choice

                            Instance size vs Performance

                            bull Super-linear speedup with larger size worker instances

                            bull Primarily due to the memory capability

                            Task SizeInstance Size vs Cost

                            bull Extra-large instance generated the best and the most economical throughput

                            bull Fully utilize the resource

                            Web

                            Portal

                            Web

                            Service

                            Job registration

                            Job Scheduler

                            Worker

                            Worker

                            Worker

                            Global

                            dispatch

                            queue

                            Web Role

                            Azure Table

                            Job Management Role

                            Azure Blob

                            Database

                            updating Role

                            hellip

                            Scaling Engine

                            Blast databases

                            temporary data etc)

                            Job Registry NCBI databases

                            BLAST task

                            Splitting task

                            BLAST task

                            BLAST task

                            BLAST task

                            hellip

                            Merging Task

                            ASPNET program hosted by a web role instance bull Submit jobs

                            bull Track jobrsquos status and logs

                            AuthenticationAuthorization based on Live ID

                            The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                            Web Portal

                            Web Service

                            Job registration

                            Job Scheduler

                            Job Portal

                            Scaling Engine

                            Job Registry

                            R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                            Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                            bull Against ~5000 proteins from another strain completed in less than 30 sec

                            AzureBLAST significantly saved computing timehellip

                            Discovering Homologs bull Discover the interrelationships of known protein sequences

                            ldquoAll against Allrdquo query bull The database is also the input query

                            bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                            bull Theoretically 100 billion sequence comparisons

                            Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                            bull Would require 3216731 minutes (61 years) on one desktop

                            One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                            bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                            bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                            bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                            bull Each segment consists of smaller partitions

                            bull When load imbalances redistribute the load manually

                            5

                            0

                            62 6

                            2 6

                            2 6

                            2 6

                            2 5

                            0 62

                            bull Total size of the output result is ~230GB

                            bull The number of total hits is 1764579487

                            bull Started at March 25th the last task completed on April 8th (10 days compute)

                            bull But based our estimates real working instance time should be 6~8 day

                            bull Look into log data to analyze what took placehellip

                            5

                            0

                            62 6

                            2 6

                            2 6

                            2 6

                            2 5

                            0 62

                            A normal log record should be

                            Otherwise something is wrong (eg task failed to complete)

                            3312010 614 RD00155D3611B0 Executing the task 251523

                            3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                            3312010 625 RD00155D3611B0 Executing the task 251553

                            3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                            3312010 644 RD00155D3611B0 Executing the task 251600

                            3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                            3312010 822 RD00155D3611B0 Executing the task 251774

                            3312010 950 RD00155D3611B0 Executing the task 251895

                            3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                            North Europe Data Center totally 34256 tasks processed

                            All 62 compute nodes lost tasks

                            and then came back in a group

                            This is an Update domain

                            ~30 mins

                            ~ 6 nodes in one group

                            35 Nodes experience blob

                            writing failure at same

                            time

                            West Europe Datacenter 30976 tasks are completed and job was killed

                            A reasonable guess the

                            Fault Domain is working

                            MODISAzure Computing Evapotranspiration (ET) in The Cloud

                            You never miss the water till the well has run dry Irish Proverb

                            ET = Water volume evapotranspired (m3 s-1 m-2)

                            Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                            λv = Latent heat of vaporization (Jg)

                            Rn = Net radiation (W m-2)

                            cp = Specific heat capacity of air (J kg-1 K-1)

                            ρa = dry air density (kg m-3)

                            δq = vapor pressure deficit (Pa)

                            ga = Conductivity of air (inverse of ra) (m s-1)

                            gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                            γ = Psychrometric constant (γ asymp 66 Pa K-1)

                            Estimating resistanceconductivity across a

                            catchment can be tricky

                            bull Lots of inputs big data reduction

                            bull Some of the inputs are not so simple

                            119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                            (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                            Penman-Monteith (1964)

                            Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                            NASA MODIS

                            imagery source

                            archives

                            5 TB (600K files)

                            FLUXNET curated

                            sensor dataset

                            (30GB 960 files)

                            FLUXNET curated

                            field dataset

                            2 KB (1 file)

                            NCEPNCAR

                            ~100MB

                            (4K files)

                            Vegetative

                            clumping

                            ~5MB (1file)

                            Climate

                            classification

                            ~1MB (1file)

                            20 US year = 1 global year

                            Data collection (map) stage

                            bull Downloads requested input tiles

                            from NASA ftp sites

                            bull Includes geospatial lookup for

                            non-sinusoidal tiles that will

                            contribute to a reprojected

                            sinusoidal tile

                            Reprojection (map) stage

                            bull Converts source tile(s) to

                            intermediate result sinusoidal tiles

                            bull Simple nearest neighbor or spline

                            algorithms

                            Derivation reduction stage

                            bull First stage visible to scientist

                            bull Computes ET in our initial use

                            Analysis reduction stage

                            bull Optional second stage visible to

                            scientist

                            bull Enables production of science

                            analysis artifacts such as maps

                            tables virtual sensors

                            Reduction 1

                            Queue

                            Source

                            Metadata

                            AzureMODIS

                            Service Web Role Portal

                            Request

                            Queue

                            Scientific

                            Results

                            Download

                            Data Collection Stage

                            Source Imagery Download Sites

                            Reprojection

                            Queue

                            Reduction 2

                            Queue

                            Download

                            Queue

                            Scientists

                            Science results

                            Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                            httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                            bull ModisAzure Service is the Web Role front door bull Receives all user requests

                            bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                            bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                            recoverable units of work

                            bull Execution status of all jobs and tasks persisted in Tables

                            ltPipelineStagegt

                            Request

                            hellip ltPipelineStagegtJobStatus

                            Persist ltPipelineStagegtJob Queue

                            MODISAzure Service

                            (Web Role)

                            Service Monitor

                            (Worker Role)

                            Parse amp Persist ltPipelineStagegtTaskStatus

                            hellip

                            Dispatch

                            ltPipelineStagegtTask Queue

                            All work actually done by a Worker Role

                            bull Sandboxes science or other executable

                            bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                            Service Monitor

                            (Worker Role)

                            Parse amp Persist ltPipelineStagegtTaskStatus

                            GenericWorker

                            (Worker Role)

                            hellip

                            hellip

                            Dispatch

                            ltPipelineStagegtTask Queue

                            hellip

                            ltInputgtData Storage

                            bull Dequeues tasks created by the Service Monitor

                            bull Retries failed tasks 3 times

                            bull Maintains all task status

                            Reprojection Request

                            hellip

                            Service Monitor

                            (Worker Role)

                            ReprojectionJobStatus Persist

                            Parse amp Persist ReprojectionTaskStatus

                            GenericWorker

                            (Worker Role)

                            hellip

                            Job Queue

                            hellip

                            Dispatch

                            Task Queue

                            Points to

                            hellip

                            ScanTimeList

                            SwathGranuleMeta

                            Reprojection Data

                            Storage

                            Each entity specifies a

                            single reprojection job

                            request

                            Each entity specifies a

                            single reprojection task (ie

                            a single tile)

                            Query this table to get

                            geo-metadata (eg

                            boundaries) for each swath

                            tile

                            Query this table to get the

                            list of satellite scan times

                            that cover a target tile

                            Swath Source

                            Data Storage

                            bull Computational costs driven by data scale and need to run reduction multiple times

                            bull Storage costs driven by data scale and 6 month project duration

                            bull Small with respect to the people costs even at graduate student rates

                            Reduction 1 Queue

                            Source Metadata

                            Request Queue

                            Scientific Results Download

                            Data Collection Stage

                            Source Imagery Download Sites

                            Reprojection Queue

                            Reduction 2 Queue

                            Download

                            Queue

                            Scientists

                            Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                            400-500 GB

                            60K files

                            10 MBsec

                            11 hours

                            lt10 workers

                            $50 upload

                            $450 storage

                            400 GB

                            45K files

                            3500 hours

                            20-100

                            workers

                            5-7 GB

                            55K files

                            1800 hours

                            20-100

                            workers

                            lt10 GB

                            ~1K files

                            1800 hours

                            20-100

                            workers

                            $420 cpu

                            $60 download

                            $216 cpu

                            $1 download

                            $6 storage

                            $216 cpu

                            $2 download

                            $9 storage

                            AzureMODIS

                            Service Web Role Portal

                            Total $1420

                            bull Clouds are the largest scale computer centers ever constructed and have the

                            potential to be important to both large and small scale science problems

                            bull Equally import they can increase participation in research providing needed

                            resources to userscommunities without ready access

                            bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                            support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                            latency applications do not perform optimally on clouds today

                            bull Provide valuable fault tolerance and scalability abstractions

                            bull Clouds as amplifier for familiar client tools and on premise compute

                            bull Clouds services to support research provide considerable leverage for both

                            individual researchers and entire communities of researchers

                            • Day 2 - Azure as a PaaS
                            • Day 2 - Applications

                              Scalable Fault Tolerant Applications

                              Queues are the application glue bull Decouple parts of application easier to scale independently bull Resource allocation different priority queues and backend servers bull Mask faults in worker roles (reliable messaging)

                              Key Components ndash Compute VM Roles

                              bull Customized Role

                              bull You own the box

                              bull How it works

                              bull Download ldquoGuest OSrdquo to Server 2008 Hyper-V

                              bull Customize the OS as you need to

                              bull Upload the differences VHD

                              bull Azure runs your VM role using

                              bull Base OS

                              bull Differences VHD

                              Application Hosting

                              lsquoGrokkingrsquo the service model

                              bull Imagine white-boarding out your service architecture with boxes for nodes and arrows describing how they communicate

                              bull The service model is the same diagram written down in a declarative format

                              bull You give the Fabric the service model and the binaries that go with each of those nodes

                              bull The Fabric can provision deploy and manage that diagram for you

                              bull Find hardware home

                              bull Copy and launch your app binaries

                              bull Monitor your app and the hardware

                              bull In case of failure take action Perhaps even relocate your app

                              bull At all times the lsquodiagramrsquo stays whole

                              Automated Service Management Provide code + service model

                              bull Platform identifies and allocates resources deploys the service manages service health

                              bull Configuration is handled by two files

                              ServiceDefinitioncsdef

                              ServiceConfigurationcscfg

                              Service Definition

                              Service Configuration

                              GUI

                              Double click on Role Name in Azure Project

                              Deploying to the cloud

                              bull We can deploy from the portal or from script

                              bull VS builds two files

                              bull Encrypted package of your code

                              bull Your config file

                              bull You must create an Azure account then a service and then you deploy your code

                              bull Can take up to 20 minutes

                              bull (which is better than six months)

                              Service Management API

                              bullREST based API to manage your services

                              bullX509-certs for authentication

                              bullLets you create delete change upgrade swaphellip

                              bullLots of community and MSFT-built tools around the API

                              - Easy to roll your own

                              The Secret Sauce ndash The Fabric

                              The Fabric is the lsquobrainrsquo behind Windows Azure

                              1 Process service model

                              1 Determine resource requirements

                              2 Create role images

                              2 Allocate resources

                              3 Prepare nodes

                              1 Place role images on nodes

                              2 Configure settings

                              3 Start roles

                              4 Configure load balancers

                              5 Maintain service health

                              1 If role fails restart the role based on policy

                              2 If node fails migrate the role based on policy

                              Storage

                              Durable Storage At Massive Scale

                              Blob

                              - Massive files eg videos logs

                              Drive

                              - Use standard file system APIs

                              Tables - Non-relational but with few scale limits

                              - Use SQL Azure for relational data

                              Queues

                              - Facilitate loosely-coupled reliable systems

                              Blob Features and Functions

                              bull Store Large Objects (up to 1TB in size)

                              bull Can be served through Windows Azure CDN service

                              bull Standard REST Interface

                              bull PutBlob

                              bull Inserts a new blob overwrites the existing blob

                              bull GetBlob

                              bull Get whole blob or a specific range

                              bull DeleteBlob

                              bull CopyBlob

                              bull SnapshotBlob

                              bull LeaseBlob

                              Two Types of Blobs Under the Hood

                              bull Block Blob

                              bull Targeted at streaming workloads

                              bull Each blob consists of a sequence of blocks bull Each block is identified by a Block ID

                              bull Size limit 200GB per blob

                              bull Page Blob

                              bull Targeted at random readwrite workloads

                              bull Each blob consists of an array of pages bull Each page is identified by its offset

                              from the start of the blob

                              bull Size limit 1TB per blob

                              Windows Azure Drive

                              bull Provides a durable NTFS volume for Windows Azure applications to use

                              bull Use existing NTFS APIs to access a durable drive bull Durability and survival of data on application failover

                              bull Enables migrating existing NTFS applications to the cloud

                              bull A Windows Azure Drive is a Page Blob

                              bull Example mount Page Blob as X bull httpltaccountnamegtblobcorewindowsnetltcontainernamegtltblobnamegt

                              bull All writes to drive are made durable to the Page Blob bull Drive made durable through standard Page Blob replication

                              bull Drive persists even when not mounted as a Page Blob

                              Windows Azure Tables

                              bull Provides Structured Storage

                              bull Massively Scalable Tables bull Billions of entities (rows) and TBs of data

                              bull Can use thousands of servers as traffic grows

                              bull Highly Available amp Durable bull Data is replicated several times

                              bull Familiar and Easy to use API

                              bull WCF Data Services and OData bull NET classes and LINQ

                              bull REST ndash with any platform or language

                              Windows Azure Queues

                              bull Queue are performance efficient highly available and provide reliable message delivery

                              bull Simple asynchronous work dispatch

                              bull Programming semantics ensure that a message can be processed at least once

                              bull Access is provided via REST

                              Storage Partitioning

                              Understanding partitioning is key to understanding performance

                              bull Different for each data type (blobs entities queues) Every data object has a partition key

                              bull A partition can be served by a single server

                              bull System load balances partitions based on traffic pattern

                              bull Controls entity locality Partition key is unit of scale

                              bull Load balancing can take a few minutes to kick in

                              bull Can take a couple of seconds for partition to be available on a different server

                              System load balances

                              bull Use exponential backoff on ldquoServer Busyrdquo

                              bull Our system load balances to meet your traffic needs

                              bull Single partition limits have been reached Server Busy

                              Partition Keys In Each Abstraction

                              bull Entities w same PartitionKey value served from same partition Entities ndash TableName + PartitionKey

                              PartitionKey (CustomerId) RowKey (RowKind)

                              Name CreditCardNumber OrderTotal

                              1 Customer-John Smith John Smith xxxx-xxxx-xxxx-xxxx

                              1 Order ndash 1 $3512

                              2 Customer-Bill Johnson Bill Johnson xxxx-xxxx-xxxx-xxxx

                              2 Order ndash 3 $1000

                              bull Every blob and its snapshots are in a single partition Blobs ndash Container name + Blob name

                              bullAll messages for a single queue belong to the same partition Messages ndash Queue Name

                              Container Name Blob Name

                              image annarborbighousejpg

                              image foxboroughgillettejpg

                              video annarborbighousejpg

                              Queue Message

                              jobs Message1

                              jobs Message2

                              workflow Message1

                              Scalability Targets

                              Storage Account

                              bull Capacity ndash Up to 100 TBs

                              bull Transactions ndash Up to a few thousand requests per second

                              bull Bandwidth ndash Up to a few hundred megabytes per second

                              Single QueueTable Partition

                              bull Up to 500 transactions per second

                              To go above these numbers partition between multiple storage accounts and partitions

                              When limit is hit app will see lsquo503 server busyrsquo applications should implement exponential backoff

                              Single Blob Partition

                              bull Throughput up to 60 MBs

                              PartitionKey (Category)

                              RowKey (Title)

                              Timestamp ReleaseDate

                              Action Fast amp Furious hellip 2009

                              Action The Bourne Ultimatum hellip 2007

                              hellip hellip hellip hellip

                              Animation Open Season 2 hellip 2009

                              Animation The Ant Bully hellip 2006

                              PartitionKey (Category)

                              RowKey (Title)

                              Timestamp ReleaseDate

                              Comedy Office Space hellip 1999

                              hellip hellip hellip hellip

                              SciFi X-Men Origins Wolverine hellip 2009

                              hellip hellip hellip hellip

                              War Defiance hellip 2008

                              PartitionKey (Category)

                              RowKey (Title)

                              Timestamp ReleaseDate

                              Action Fast amp Furious hellip 2009

                              Action The Bourne Ultimatum hellip 2007

                              hellip hellip hellip hellip

                              Animation Open Season 2 hellip 2009

                              Animation The Ant Bully hellip 2006

                              hellip hellip hellip hellip

                              Comedy Office Space hellip 1999

                              hellip hellip hellip hellip

                              SciFi X-Men Origins Wolverine hellip 2009

                              hellip hellip hellip hellip

                              War Defiance hellip 2008

                              Partitions and Partition Ranges

                              Key Selection Things to Consider

                              bullDistribute load as much as possible bullHot partitions can be load balanced bullPartitionKey is critical for scalability

                              See httpwwwmicrosoftpdccom2009SVC09 and httpazurescopecloudappnet for more information

                              bull Avoid frequent large scans bull Parallelize queries bull Point queries are most efficient

                              bullTransactions across a single partition bullTransaction semantics amp Reduce round trips

                              Scalability

                              Query Efficiency amp Speed

                              Entity group transactions

                              Expect Continuation Tokens ndash Seriously

                              Maximum of 1000 rows in a response

                              At the end of partition range boundary

                              Maximum of 1000 rows in a response

                              At the end of partition range boundary

                              Maximum of 5 seconds to execute the query

                              Tables Recap bullEfficient for frequently used queries

                              bullSupports batch transactions

                              bullDistributes load

                              Select PartitionKey and RowKey that help scale

                              Avoid ldquoAppend onlyrdquo patterns

                              Always Handle continuation tokens

                              ldquoORrdquo predicates are not optimized

                              Implement back-off strategy for retries

                              bullDistribute by using a hash etc as prefix

                              bullExpect continuation tokens for range queries

                              bullExecute the queries that form the ldquoORrdquo predicates as separate queries

                              bullServer busy

                              bullLoad balance partitions to meet traffic needs

                              bullLoad on single partition has exceeded the limits

                              WCF Data Services

                              bullUse a new context for each logical operation

                              bullAddObjectAttachTo can throw exception if entity is already being tracked

                              bullPoint query throws an exception if resource does not exist Use IgnoreResourceNotFoundException

                              Queues Their Unique Role in Building Reliable Scalable Applications

                              bull Want roles that work closely together but are not bound together bull Tight coupling leads to brittleness

                              bull This can aid in scaling and performance

                              bull A queue can hold an unlimited number of messages bull Messages must be serializable as XML

                              bull Limited to 8KB in size

                              bull Commonly use the work ticket pattern

                              bull Why not simply use a table

                              Queue Terminology

                              Message Lifecycle

                              Queue

                              Msg 1

                              Msg 2

                              Msg 3

                              Msg 4

                              Worker Role

                              Worker Role

                              PutMessage

                              Web Role

                              GetMessage (Timeout) RemoveMessage

                              Msg 2 Msg 1

                              Worker Role

                              Msg 2

                              POST httpmyaccountqueuecorewindowsnetmyqueuemessages

                              HTTP11 200 OK Transfer-Encoding chunked Content-Type applicationxml Date Tue 09 Dec 2008 210430 GMT Server Nephos Queue Service Version 10 Microsoft-HTTPAPI20

                              ltxml version=10 encoding=utf-8gt ltQueueMessagesListgt ltQueueMessagegt ltMessageIdgt5974b586-0df3-4e2d-ad0c-18e3892bfca2ltMessageIdgt ltInsertionTimegtMon 22 Sep 2008 232920 GMTltInsertionTimegt ltExpirationTimegtMon 29 Sep 2008 232920 GMTltExpirationTimegt ltPopReceiptgtYzQ4Yzg1MDIGM0MDFiZDAwYzEwltPopReceiptgt ltTimeNextVisiblegtTue 23 Sep 2008 052920GMTltTimeNextVisiblegt ltMessageTextgtPHRlc3Q+dGdGVzdD4=ltMessageTextgt ltQueueMessagegt ltQueueMessagesListgt

                              DELETE httpmyaccountqueuecorewindowsnetmyqueuemessagesmessageidpopreceipt=YzQ4Yzg1MDIGM0MDFiZDAwYzEw

                              Truncated Exponential Back Off Polling

                              Consider a backoff polling approach Each empty poll

                              increases interval by 2x

                              A successful sets the interval back to 1

                              44

                              2 1

                              1 1

                              C1

                              C2

                              Removing Poison Messages

                              1 1

                              2 1

                              3 4 0

                              Producers Consumers

                              P2

                              P1

                              3 0

                              2 GetMessage(Q 30 s) msg 2

                              1 GetMessage(Q 30 s) msg 1

                              1 1

                              2 1

                              1 0

                              2 0

                              45

                              C1

                              C2

                              Removing Poison Messages

                              3 4 0

                              Producers Consumers

                              P2

                              P1

                              1 1

                              2 1

                              2 GetMessage(Q 30 s) msg 2 3 C2 consumed msg 2 4 DeleteMessage(Q msg 2) 7 GetMessage(Q 30 s) msg 1

                              1 GetMessage(Q 30 s) msg 1 5 C1 crashed

                              1 1

                              2 1

                              6 msg1 visible 30 s after Dequeue 3 0

                              1 2

                              1 1

                              1 2

                              46

                              C1

                              C2

                              Removing Poison Messages

                              3 4 0

                              Producers Consumers

                              P2

                              P1

                              1 2

                              2 Dequeue(Q 30 sec) msg 2 3 C2 consumed msg 2 4 Delete(Q msg 2) 7 Dequeue(Q 30 sec) msg 1 8 C2 crashed

                              1 Dequeue(Q 30 sec) msg 1 5 C1 crashed 10 C1 restarted 11 Dequeue(Q 30 sec) msg 1 12 DequeueCount gt 2 13 Delete (Q msg1) 1

                              2

                              6 msg1 visible 30s after Dequeue 9 msg1 visible 30s after Dequeue

                              3 0

                              1 3

                              1 2

                              1 3

                              Queues Recap

                              bullNo need to deal with failures Make message

                              processing idempotent

                              bullInvisible messages result in out of order Do not rely on order

                              bullEnforce threshold on messagersquos dequeue count Use Dequeue count to remove

                              poison messages

                              bullMessages gt 8KB

                              bullBatch messages

                              bullGarbage collect orphaned blobs

                              bullDynamically increasereduce workers

                              Use blob to store message data with

                              reference in message

                              Use message count to scale

                              bullNo need to deal with failures

                              bullInvisible messages result in out of order

                              bullEnforce threshold on messagersquos dequeue count

                              bullDynamically increasereduce workers

                              Windows Azure Storage Takeaways

                              Blobs

                              Drives

                              Tables

                              Queues

                              httpblogsmsdncomwindowsazurestorage

                              httpazurescopecloudappnet

                              49

                              A Quick Exercise

                              hellipThen letrsquos look at some code and some tools

                              50

                              Code ndash AccountInformationcs public class AccountInformation private static string storageKey = ldquotHiSiSnOtMyKeY private static string accountName = jjstore private static StorageCredentialsAccountAndKey credentials internal static StorageCredentialsAccountAndKey Credentials get if (credentials == null) credentials = new StorageCredentialsAccountAndKey(accountName storageKey) return credentials

                              51

                              Code ndash BlobHelpercs public class BlobHelper private static string defaultContainerName = school private CloudBlobClient client = null private CloudBlobContainer container = null private void InitContainer() if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudBlobClient() container = clientGetContainerReference(defaultContainerName) containerCreateIfNotExist() BlobContainerPermissions permissions = containerGetPermissions() permissionsPublicAccess = BlobContainerPublicAccessTypeContainer containerSetPermissions(permissions)

                              52

                              Code ndash BlobHelpercs

                              public void WriteFileToBlob(string filePath) if (client == null || container == null) InitContainer() FileInfo file = new FileInfo(filePath) CloudBlob blob = containerGetBlobReference(fileName) blobPropertiesContentType = GetContentType(fileExtension) blobUploadFile(fileFullName) Or if you want to write a string replace the last line with blobUploadText(someString) And make sure you set the content type to the appropriate MIME type (eg ldquotextplainrdquo)

                              53

                              Code ndash BlobHelpercs

                              public string GetBlobText(string blobName) if (client == null || container == null) InitContainer() CloudBlob blob = containerGetBlobReference(blobName) try return blobDownloadText() catch (Exception) The blob probably does not exist or there is no connection available return null

                              54

                              Application Code - Blobs private void SaveToCloudButton_Click(object sender RoutedEventArgs e) StringBuilder buff = new StringBuilder() buffAppendLine(LastNameFirstNameEmailBirthdayNativeLanguageFavoriteIceCreamYearsInPhDGraduated) foreach (AttendeeEntity attendee in attendees) buffAppendLine(attendeeToCsvString()) blobHelperWriteStringToBlob(SummerSchoolAttendeestxt buffToString())

                              The blob is now available at httpltAccountNamegtblobcorewindowsnetltContainerNamegtltBlobNamegt Or in this case httpjjstoreblobcorewindowsnetschoolSummerSchoolAttendeestxt

                              55

                              Code - TableEntities using MicrosoftWindowsAzureStorageClient public class AttendeeEntity TableServiceEntity public string FirstName get set public string LastName get set public string Email get set public DateTime Birthday get set public string FavoriteIceCream get set public int YearsInPhD get set public bool Graduated get set hellip

                              56

                              Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

                              57

                              Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

                              58

                              Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

                              59

                              Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

                              60

                              Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

                              Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

                              61

                              Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

                              62

                              Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

                              Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

                              63

                              Tools ndash Fiddler2

                              Best Practices

                              Picking the Right VM Size

                              bull Having the correct VM size can make a big difference in costs

                              bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                              bull If you scale better than linear across cores larger VMs could save you money

                              bull Pretty rare to see linear scaling across 8 cores

                              bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                              bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                              Using Your VM to the Maximum

                              Remember bull 1 role instance == 1 VM running Windows

                              bull 1 role instance = one specific task for your code

                              bull Yoursquore paying for the entire VM so why not use it

                              bull Common mistake ndash split up code into multiple roles each not using up CPU

                              bull Balance between using up CPU vs having free capacity in times of need

                              bull Multiple ways to use your CPU to the fullest

                              Exploiting Concurrency

                              bull Spin up additional processes each with a specific task or as a unit of concurrency

                              bull May not be ideal if number of active processes exceeds number of cores

                              bull Use multithreading aggressively

                              bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                              bull In NET 4 use the Task Parallel Library

                              bull Data parallelism

                              bull Task parallelism

                              Finding Good Code Neighbors

                              bull Typically code falls into one or more of these categories

                              bull Find code that is intensive with different resources to live together

                              bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                              Memory Intensive

                              CPU Intensive

                              Network IO Intensive

                              Storage IO Intensive

                              Scaling Appropriately

                              bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                              bull Spinning VMs up and down automatically is good at large scale

                              bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                              bull Being too aggressive in spinning down VMs can result in poor user experience

                              bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                              Performance Cost

                              Storage Costs

                              bull Understand an applicationrsquos storage profile and how storage billing works

                              bull Make service choices based on your app profile

                              bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                              bull Service choice can make a big cost difference based on your app profile

                              bull Caching and compressing They help a lot with storage costs

                              Saving Bandwidth Costs

                              Bandwidth costs are a huge part of any popular web apprsquos billing profile

                              Sending fewer things over the wire often means getting fewer things from storage

                              Saving bandwidth costs often lead to savings in other places

                              Sending fewer things means your VM has time to do other tasks

                              All of these tips have the side benefit of improving your web apprsquos performance and user experience

                              Compressing Content

                              1 Gzip all output content

                              bull All modern browsers can decompress on the fly

                              bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                              2Tradeoff compute costs for storage size

                              3Minimize image sizes

                              bull Use Portable Network Graphics (PNGs)

                              bull Crush your PNGs

                              bull Strip needless metadata

                              bull Make all PNGs palette PNGs

                              Uncompressed

                              Content

                              Compressed

                              Content

                              Gzip

                              Minify JavaScript

                              Minify CCS

                              Minify Images

                              Best Practices Summary

                              Doing lsquolessrsquo is the key to saving costs

                              Measure everything

                              Know your application profile in and out

                              Research Examples in the Cloud

                              hellipon another set of slides

                              Map Reduce on Azure

                              bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                              bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                              76

                              Project Daytona - Map Reduce on Azure

                              httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                              77

                              Questions and Discussionhellip

                              Thank you for hosting me at the Summer School

                              BLAST (Basic Local Alignment Search Tool)

                              bull The most important software in bioinformatics

                              bull Identify similarity between bio-sequences

                              Computationally intensive

                              bull Large number of pairwise alignment operations

                              bull A BLAST running can take 700 ~ 1000 CPU hours

                              bull Sequence databases growing exponentially

                              bull GenBank doubled in size in about 15 months

                              It is easy to parallelize BLAST

                              bull Segment the input

                              bull Segment processing (querying) is pleasingly parallel

                              bull Segment the database (eg mpiBLAST)

                              bull Needs special result reduction processing

                              Large volume data

                              bull A normal Blast database can be as large as 10GB

                              bull 100 nodes means the peak storage bandwidth could reach to 1TB

                              bull The output of BLAST is usually 10-100x larger than the input

                              bull Parallel BLAST engine on Azure

                              bull Query-segmentation data-parallel pattern bull split the input sequences

                              bull query partitions in parallel

                              bull merge results together when done

                              bull Follows the general suggested application model bull Web Role + Queue + Worker

                              bull With three special considerations bull Batch job management

                              bull Task parallelism on an elastic Cloud

                              Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                              on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                              A simple SplitJoin pattern

                              Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                              bull 1248 for small middle large and extra large instance size

                              Task granularity bull Large partition load imbalance

                              bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                              bull Data transferring overhead

                              Best Practice test runs to profiling and set size to mitigate the overhead

                              Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                              bull too small repeated computation

                              bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                              bull Estimate the value based on the number of pair-bases in the partition and test-runs

                              bull Watch out for the 2-hour maximum limitation

                              BLAST task

                              Splitting task

                              BLAST task

                              BLAST task

                              BLAST task

                              hellip

                              Merging Task

                              Task size vs Performance

                              bull Benefit of the warm cache effect

                              bull 100 sequences per partition is the best choice

                              Instance size vs Performance

                              bull Super-linear speedup with larger size worker instances

                              bull Primarily due to the memory capability

                              Task SizeInstance Size vs Cost

                              bull Extra-large instance generated the best and the most economical throughput

                              bull Fully utilize the resource

                              Web

                              Portal

                              Web

                              Service

                              Job registration

                              Job Scheduler

                              Worker

                              Worker

                              Worker

                              Global

                              dispatch

                              queue

                              Web Role

                              Azure Table

                              Job Management Role

                              Azure Blob

                              Database

                              updating Role

                              hellip

                              Scaling Engine

                              Blast databases

                              temporary data etc)

                              Job Registry NCBI databases

                              BLAST task

                              Splitting task

                              BLAST task

                              BLAST task

                              BLAST task

                              hellip

                              Merging Task

                              ASPNET program hosted by a web role instance bull Submit jobs

                              bull Track jobrsquos status and logs

                              AuthenticationAuthorization based on Live ID

                              The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                              Web Portal

                              Web Service

                              Job registration

                              Job Scheduler

                              Job Portal

                              Scaling Engine

                              Job Registry

                              R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                              Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                              bull Against ~5000 proteins from another strain completed in less than 30 sec

                              AzureBLAST significantly saved computing timehellip

                              Discovering Homologs bull Discover the interrelationships of known protein sequences

                              ldquoAll against Allrdquo query bull The database is also the input query

                              bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                              bull Theoretically 100 billion sequence comparisons

                              Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                              bull Would require 3216731 minutes (61 years) on one desktop

                              One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                              bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                              bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                              bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                              bull Each segment consists of smaller partitions

                              bull When load imbalances redistribute the load manually

                              5

                              0

                              62 6

                              2 6

                              2 6

                              2 6

                              2 5

                              0 62

                              bull Total size of the output result is ~230GB

                              bull The number of total hits is 1764579487

                              bull Started at March 25th the last task completed on April 8th (10 days compute)

                              bull But based our estimates real working instance time should be 6~8 day

                              bull Look into log data to analyze what took placehellip

                              5

                              0

                              62 6

                              2 6

                              2 6

                              2 6

                              2 5

                              0 62

                              A normal log record should be

                              Otherwise something is wrong (eg task failed to complete)

                              3312010 614 RD00155D3611B0 Executing the task 251523

                              3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                              3312010 625 RD00155D3611B0 Executing the task 251553

                              3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                              3312010 644 RD00155D3611B0 Executing the task 251600

                              3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                              3312010 822 RD00155D3611B0 Executing the task 251774

                              3312010 950 RD00155D3611B0 Executing the task 251895

                              3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                              North Europe Data Center totally 34256 tasks processed

                              All 62 compute nodes lost tasks

                              and then came back in a group

                              This is an Update domain

                              ~30 mins

                              ~ 6 nodes in one group

                              35 Nodes experience blob

                              writing failure at same

                              time

                              West Europe Datacenter 30976 tasks are completed and job was killed

                              A reasonable guess the

                              Fault Domain is working

                              MODISAzure Computing Evapotranspiration (ET) in The Cloud

                              You never miss the water till the well has run dry Irish Proverb

                              ET = Water volume evapotranspired (m3 s-1 m-2)

                              Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                              λv = Latent heat of vaporization (Jg)

                              Rn = Net radiation (W m-2)

                              cp = Specific heat capacity of air (J kg-1 K-1)

                              ρa = dry air density (kg m-3)

                              δq = vapor pressure deficit (Pa)

                              ga = Conductivity of air (inverse of ra) (m s-1)

                              gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                              γ = Psychrometric constant (γ asymp 66 Pa K-1)

                              Estimating resistanceconductivity across a

                              catchment can be tricky

                              bull Lots of inputs big data reduction

                              bull Some of the inputs are not so simple

                              119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                              (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                              Penman-Monteith (1964)

                              Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                              NASA MODIS

                              imagery source

                              archives

                              5 TB (600K files)

                              FLUXNET curated

                              sensor dataset

                              (30GB 960 files)

                              FLUXNET curated

                              field dataset

                              2 KB (1 file)

                              NCEPNCAR

                              ~100MB

                              (4K files)

                              Vegetative

                              clumping

                              ~5MB (1file)

                              Climate

                              classification

                              ~1MB (1file)

                              20 US year = 1 global year

                              Data collection (map) stage

                              bull Downloads requested input tiles

                              from NASA ftp sites

                              bull Includes geospatial lookup for

                              non-sinusoidal tiles that will

                              contribute to a reprojected

                              sinusoidal tile

                              Reprojection (map) stage

                              bull Converts source tile(s) to

                              intermediate result sinusoidal tiles

                              bull Simple nearest neighbor or spline

                              algorithms

                              Derivation reduction stage

                              bull First stage visible to scientist

                              bull Computes ET in our initial use

                              Analysis reduction stage

                              bull Optional second stage visible to

                              scientist

                              bull Enables production of science

                              analysis artifacts such as maps

                              tables virtual sensors

                              Reduction 1

                              Queue

                              Source

                              Metadata

                              AzureMODIS

                              Service Web Role Portal

                              Request

                              Queue

                              Scientific

                              Results

                              Download

                              Data Collection Stage

                              Source Imagery Download Sites

                              Reprojection

                              Queue

                              Reduction 2

                              Queue

                              Download

                              Queue

                              Scientists

                              Science results

                              Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                              httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                              bull ModisAzure Service is the Web Role front door bull Receives all user requests

                              bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                              bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                              recoverable units of work

                              bull Execution status of all jobs and tasks persisted in Tables

                              ltPipelineStagegt

                              Request

                              hellip ltPipelineStagegtJobStatus

                              Persist ltPipelineStagegtJob Queue

                              MODISAzure Service

                              (Web Role)

                              Service Monitor

                              (Worker Role)

                              Parse amp Persist ltPipelineStagegtTaskStatus

                              hellip

                              Dispatch

                              ltPipelineStagegtTask Queue

                              All work actually done by a Worker Role

                              bull Sandboxes science or other executable

                              bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                              Service Monitor

                              (Worker Role)

                              Parse amp Persist ltPipelineStagegtTaskStatus

                              GenericWorker

                              (Worker Role)

                              hellip

                              hellip

                              Dispatch

                              ltPipelineStagegtTask Queue

                              hellip

                              ltInputgtData Storage

                              bull Dequeues tasks created by the Service Monitor

                              bull Retries failed tasks 3 times

                              bull Maintains all task status

                              Reprojection Request

                              hellip

                              Service Monitor

                              (Worker Role)

                              ReprojectionJobStatus Persist

                              Parse amp Persist ReprojectionTaskStatus

                              GenericWorker

                              (Worker Role)

                              hellip

                              Job Queue

                              hellip

                              Dispatch

                              Task Queue

                              Points to

                              hellip

                              ScanTimeList

                              SwathGranuleMeta

                              Reprojection Data

                              Storage

                              Each entity specifies a

                              single reprojection job

                              request

                              Each entity specifies a

                              single reprojection task (ie

                              a single tile)

                              Query this table to get

                              geo-metadata (eg

                              boundaries) for each swath

                              tile

                              Query this table to get the

                              list of satellite scan times

                              that cover a target tile

                              Swath Source

                              Data Storage

                              bull Computational costs driven by data scale and need to run reduction multiple times

                              bull Storage costs driven by data scale and 6 month project duration

                              bull Small with respect to the people costs even at graduate student rates

                              Reduction 1 Queue

                              Source Metadata

                              Request Queue

                              Scientific Results Download

                              Data Collection Stage

                              Source Imagery Download Sites

                              Reprojection Queue

                              Reduction 2 Queue

                              Download

                              Queue

                              Scientists

                              Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                              400-500 GB

                              60K files

                              10 MBsec

                              11 hours

                              lt10 workers

                              $50 upload

                              $450 storage

                              400 GB

                              45K files

                              3500 hours

                              20-100

                              workers

                              5-7 GB

                              55K files

                              1800 hours

                              20-100

                              workers

                              lt10 GB

                              ~1K files

                              1800 hours

                              20-100

                              workers

                              $420 cpu

                              $60 download

                              $216 cpu

                              $1 download

                              $6 storage

                              $216 cpu

                              $2 download

                              $9 storage

                              AzureMODIS

                              Service Web Role Portal

                              Total $1420

                              bull Clouds are the largest scale computer centers ever constructed and have the

                              potential to be important to both large and small scale science problems

                              bull Equally import they can increase participation in research providing needed

                              resources to userscommunities without ready access

                              bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                              support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                              latency applications do not perform optimally on clouds today

                              bull Provide valuable fault tolerance and scalability abstractions

                              bull Clouds as amplifier for familiar client tools and on premise compute

                              bull Clouds services to support research provide considerable leverage for both

                              individual researchers and entire communities of researchers

                              • Day 2 - Azure as a PaaS
                              • Day 2 - Applications

                                Key Components ndash Compute VM Roles

                                bull Customized Role

                                bull You own the box

                                bull How it works

                                bull Download ldquoGuest OSrdquo to Server 2008 Hyper-V

                                bull Customize the OS as you need to

                                bull Upload the differences VHD

                                bull Azure runs your VM role using

                                bull Base OS

                                bull Differences VHD

                                Application Hosting

                                lsquoGrokkingrsquo the service model

                                bull Imagine white-boarding out your service architecture with boxes for nodes and arrows describing how they communicate

                                bull The service model is the same diagram written down in a declarative format

                                bull You give the Fabric the service model and the binaries that go with each of those nodes

                                bull The Fabric can provision deploy and manage that diagram for you

                                bull Find hardware home

                                bull Copy and launch your app binaries

                                bull Monitor your app and the hardware

                                bull In case of failure take action Perhaps even relocate your app

                                bull At all times the lsquodiagramrsquo stays whole

                                Automated Service Management Provide code + service model

                                bull Platform identifies and allocates resources deploys the service manages service health

                                bull Configuration is handled by two files

                                ServiceDefinitioncsdef

                                ServiceConfigurationcscfg

                                Service Definition

                                Service Configuration

                                GUI

                                Double click on Role Name in Azure Project

                                Deploying to the cloud

                                bull We can deploy from the portal or from script

                                bull VS builds two files

                                bull Encrypted package of your code

                                bull Your config file

                                bull You must create an Azure account then a service and then you deploy your code

                                bull Can take up to 20 minutes

                                bull (which is better than six months)

                                Service Management API

                                bullREST based API to manage your services

                                bullX509-certs for authentication

                                bullLets you create delete change upgrade swaphellip

                                bullLots of community and MSFT-built tools around the API

                                - Easy to roll your own

                                The Secret Sauce ndash The Fabric

                                The Fabric is the lsquobrainrsquo behind Windows Azure

                                1 Process service model

                                1 Determine resource requirements

                                2 Create role images

                                2 Allocate resources

                                3 Prepare nodes

                                1 Place role images on nodes

                                2 Configure settings

                                3 Start roles

                                4 Configure load balancers

                                5 Maintain service health

                                1 If role fails restart the role based on policy

                                2 If node fails migrate the role based on policy

                                Storage

                                Durable Storage At Massive Scale

                                Blob

                                - Massive files eg videos logs

                                Drive

                                - Use standard file system APIs

                                Tables - Non-relational but with few scale limits

                                - Use SQL Azure for relational data

                                Queues

                                - Facilitate loosely-coupled reliable systems

                                Blob Features and Functions

                                bull Store Large Objects (up to 1TB in size)

                                bull Can be served through Windows Azure CDN service

                                bull Standard REST Interface

                                bull PutBlob

                                bull Inserts a new blob overwrites the existing blob

                                bull GetBlob

                                bull Get whole blob or a specific range

                                bull DeleteBlob

                                bull CopyBlob

                                bull SnapshotBlob

                                bull LeaseBlob

                                Two Types of Blobs Under the Hood

                                bull Block Blob

                                bull Targeted at streaming workloads

                                bull Each blob consists of a sequence of blocks bull Each block is identified by a Block ID

                                bull Size limit 200GB per blob

                                bull Page Blob

                                bull Targeted at random readwrite workloads

                                bull Each blob consists of an array of pages bull Each page is identified by its offset

                                from the start of the blob

                                bull Size limit 1TB per blob

                                Windows Azure Drive

                                bull Provides a durable NTFS volume for Windows Azure applications to use

                                bull Use existing NTFS APIs to access a durable drive bull Durability and survival of data on application failover

                                bull Enables migrating existing NTFS applications to the cloud

                                bull A Windows Azure Drive is a Page Blob

                                bull Example mount Page Blob as X bull httpltaccountnamegtblobcorewindowsnetltcontainernamegtltblobnamegt

                                bull All writes to drive are made durable to the Page Blob bull Drive made durable through standard Page Blob replication

                                bull Drive persists even when not mounted as a Page Blob

                                Windows Azure Tables

                                bull Provides Structured Storage

                                bull Massively Scalable Tables bull Billions of entities (rows) and TBs of data

                                bull Can use thousands of servers as traffic grows

                                bull Highly Available amp Durable bull Data is replicated several times

                                bull Familiar and Easy to use API

                                bull WCF Data Services and OData bull NET classes and LINQ

                                bull REST ndash with any platform or language

                                Windows Azure Queues

                                bull Queue are performance efficient highly available and provide reliable message delivery

                                bull Simple asynchronous work dispatch

                                bull Programming semantics ensure that a message can be processed at least once

                                bull Access is provided via REST

                                Storage Partitioning

                                Understanding partitioning is key to understanding performance

                                bull Different for each data type (blobs entities queues) Every data object has a partition key

                                bull A partition can be served by a single server

                                bull System load balances partitions based on traffic pattern

                                bull Controls entity locality Partition key is unit of scale

                                bull Load balancing can take a few minutes to kick in

                                bull Can take a couple of seconds for partition to be available on a different server

                                System load balances

                                bull Use exponential backoff on ldquoServer Busyrdquo

                                bull Our system load balances to meet your traffic needs

                                bull Single partition limits have been reached Server Busy

                                Partition Keys In Each Abstraction

                                bull Entities w same PartitionKey value served from same partition Entities ndash TableName + PartitionKey

                                PartitionKey (CustomerId) RowKey (RowKind)

                                Name CreditCardNumber OrderTotal

                                1 Customer-John Smith John Smith xxxx-xxxx-xxxx-xxxx

                                1 Order ndash 1 $3512

                                2 Customer-Bill Johnson Bill Johnson xxxx-xxxx-xxxx-xxxx

                                2 Order ndash 3 $1000

                                bull Every blob and its snapshots are in a single partition Blobs ndash Container name + Blob name

                                bullAll messages for a single queue belong to the same partition Messages ndash Queue Name

                                Container Name Blob Name

                                image annarborbighousejpg

                                image foxboroughgillettejpg

                                video annarborbighousejpg

                                Queue Message

                                jobs Message1

                                jobs Message2

                                workflow Message1

                                Scalability Targets

                                Storage Account

                                bull Capacity ndash Up to 100 TBs

                                bull Transactions ndash Up to a few thousand requests per second

                                bull Bandwidth ndash Up to a few hundred megabytes per second

                                Single QueueTable Partition

                                bull Up to 500 transactions per second

                                To go above these numbers partition between multiple storage accounts and partitions

                                When limit is hit app will see lsquo503 server busyrsquo applications should implement exponential backoff

                                Single Blob Partition

                                bull Throughput up to 60 MBs

                                PartitionKey (Category)

                                RowKey (Title)

                                Timestamp ReleaseDate

                                Action Fast amp Furious hellip 2009

                                Action The Bourne Ultimatum hellip 2007

                                hellip hellip hellip hellip

                                Animation Open Season 2 hellip 2009

                                Animation The Ant Bully hellip 2006

                                PartitionKey (Category)

                                RowKey (Title)

                                Timestamp ReleaseDate

                                Comedy Office Space hellip 1999

                                hellip hellip hellip hellip

                                SciFi X-Men Origins Wolverine hellip 2009

                                hellip hellip hellip hellip

                                War Defiance hellip 2008

                                PartitionKey (Category)

                                RowKey (Title)

                                Timestamp ReleaseDate

                                Action Fast amp Furious hellip 2009

                                Action The Bourne Ultimatum hellip 2007

                                hellip hellip hellip hellip

                                Animation Open Season 2 hellip 2009

                                Animation The Ant Bully hellip 2006

                                hellip hellip hellip hellip

                                Comedy Office Space hellip 1999

                                hellip hellip hellip hellip

                                SciFi X-Men Origins Wolverine hellip 2009

                                hellip hellip hellip hellip

                                War Defiance hellip 2008

                                Partitions and Partition Ranges

                                Key Selection Things to Consider

                                bullDistribute load as much as possible bullHot partitions can be load balanced bullPartitionKey is critical for scalability

                                See httpwwwmicrosoftpdccom2009SVC09 and httpazurescopecloudappnet for more information

                                bull Avoid frequent large scans bull Parallelize queries bull Point queries are most efficient

                                bullTransactions across a single partition bullTransaction semantics amp Reduce round trips

                                Scalability

                                Query Efficiency amp Speed

                                Entity group transactions

                                Expect Continuation Tokens ndash Seriously

                                Maximum of 1000 rows in a response

                                At the end of partition range boundary

                                Maximum of 1000 rows in a response

                                At the end of partition range boundary

                                Maximum of 5 seconds to execute the query

                                Tables Recap bullEfficient for frequently used queries

                                bullSupports batch transactions

                                bullDistributes load

                                Select PartitionKey and RowKey that help scale

                                Avoid ldquoAppend onlyrdquo patterns

                                Always Handle continuation tokens

                                ldquoORrdquo predicates are not optimized

                                Implement back-off strategy for retries

                                bullDistribute by using a hash etc as prefix

                                bullExpect continuation tokens for range queries

                                bullExecute the queries that form the ldquoORrdquo predicates as separate queries

                                bullServer busy

                                bullLoad balance partitions to meet traffic needs

                                bullLoad on single partition has exceeded the limits

                                WCF Data Services

                                bullUse a new context for each logical operation

                                bullAddObjectAttachTo can throw exception if entity is already being tracked

                                bullPoint query throws an exception if resource does not exist Use IgnoreResourceNotFoundException

                                Queues Their Unique Role in Building Reliable Scalable Applications

                                bull Want roles that work closely together but are not bound together bull Tight coupling leads to brittleness

                                bull This can aid in scaling and performance

                                bull A queue can hold an unlimited number of messages bull Messages must be serializable as XML

                                bull Limited to 8KB in size

                                bull Commonly use the work ticket pattern

                                bull Why not simply use a table

                                Queue Terminology

                                Message Lifecycle

                                Queue

                                Msg 1

                                Msg 2

                                Msg 3

                                Msg 4

                                Worker Role

                                Worker Role

                                PutMessage

                                Web Role

                                GetMessage (Timeout) RemoveMessage

                                Msg 2 Msg 1

                                Worker Role

                                Msg 2

                                POST httpmyaccountqueuecorewindowsnetmyqueuemessages

                                HTTP11 200 OK Transfer-Encoding chunked Content-Type applicationxml Date Tue 09 Dec 2008 210430 GMT Server Nephos Queue Service Version 10 Microsoft-HTTPAPI20

                                ltxml version=10 encoding=utf-8gt ltQueueMessagesListgt ltQueueMessagegt ltMessageIdgt5974b586-0df3-4e2d-ad0c-18e3892bfca2ltMessageIdgt ltInsertionTimegtMon 22 Sep 2008 232920 GMTltInsertionTimegt ltExpirationTimegtMon 29 Sep 2008 232920 GMTltExpirationTimegt ltPopReceiptgtYzQ4Yzg1MDIGM0MDFiZDAwYzEwltPopReceiptgt ltTimeNextVisiblegtTue 23 Sep 2008 052920GMTltTimeNextVisiblegt ltMessageTextgtPHRlc3Q+dGdGVzdD4=ltMessageTextgt ltQueueMessagegt ltQueueMessagesListgt

                                DELETE httpmyaccountqueuecorewindowsnetmyqueuemessagesmessageidpopreceipt=YzQ4Yzg1MDIGM0MDFiZDAwYzEw

                                Truncated Exponential Back Off Polling

                                Consider a backoff polling approach Each empty poll

                                increases interval by 2x

                                A successful sets the interval back to 1

                                44

                                2 1

                                1 1

                                C1

                                C2

                                Removing Poison Messages

                                1 1

                                2 1

                                3 4 0

                                Producers Consumers

                                P2

                                P1

                                3 0

                                2 GetMessage(Q 30 s) msg 2

                                1 GetMessage(Q 30 s) msg 1

                                1 1

                                2 1

                                1 0

                                2 0

                                45

                                C1

                                C2

                                Removing Poison Messages

                                3 4 0

                                Producers Consumers

                                P2

                                P1

                                1 1

                                2 1

                                2 GetMessage(Q 30 s) msg 2 3 C2 consumed msg 2 4 DeleteMessage(Q msg 2) 7 GetMessage(Q 30 s) msg 1

                                1 GetMessage(Q 30 s) msg 1 5 C1 crashed

                                1 1

                                2 1

                                6 msg1 visible 30 s after Dequeue 3 0

                                1 2

                                1 1

                                1 2

                                46

                                C1

                                C2

                                Removing Poison Messages

                                3 4 0

                                Producers Consumers

                                P2

                                P1

                                1 2

                                2 Dequeue(Q 30 sec) msg 2 3 C2 consumed msg 2 4 Delete(Q msg 2) 7 Dequeue(Q 30 sec) msg 1 8 C2 crashed

                                1 Dequeue(Q 30 sec) msg 1 5 C1 crashed 10 C1 restarted 11 Dequeue(Q 30 sec) msg 1 12 DequeueCount gt 2 13 Delete (Q msg1) 1

                                2

                                6 msg1 visible 30s after Dequeue 9 msg1 visible 30s after Dequeue

                                3 0

                                1 3

                                1 2

                                1 3

                                Queues Recap

                                bullNo need to deal with failures Make message

                                processing idempotent

                                bullInvisible messages result in out of order Do not rely on order

                                bullEnforce threshold on messagersquos dequeue count Use Dequeue count to remove

                                poison messages

                                bullMessages gt 8KB

                                bullBatch messages

                                bullGarbage collect orphaned blobs

                                bullDynamically increasereduce workers

                                Use blob to store message data with

                                reference in message

                                Use message count to scale

                                bullNo need to deal with failures

                                bullInvisible messages result in out of order

                                bullEnforce threshold on messagersquos dequeue count

                                bullDynamically increasereduce workers

                                Windows Azure Storage Takeaways

                                Blobs

                                Drives

                                Tables

                                Queues

                                httpblogsmsdncomwindowsazurestorage

                                httpazurescopecloudappnet

                                49

                                A Quick Exercise

                                hellipThen letrsquos look at some code and some tools

                                50

                                Code ndash AccountInformationcs public class AccountInformation private static string storageKey = ldquotHiSiSnOtMyKeY private static string accountName = jjstore private static StorageCredentialsAccountAndKey credentials internal static StorageCredentialsAccountAndKey Credentials get if (credentials == null) credentials = new StorageCredentialsAccountAndKey(accountName storageKey) return credentials

                                51

                                Code ndash BlobHelpercs public class BlobHelper private static string defaultContainerName = school private CloudBlobClient client = null private CloudBlobContainer container = null private void InitContainer() if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudBlobClient() container = clientGetContainerReference(defaultContainerName) containerCreateIfNotExist() BlobContainerPermissions permissions = containerGetPermissions() permissionsPublicAccess = BlobContainerPublicAccessTypeContainer containerSetPermissions(permissions)

                                52

                                Code ndash BlobHelpercs

                                public void WriteFileToBlob(string filePath) if (client == null || container == null) InitContainer() FileInfo file = new FileInfo(filePath) CloudBlob blob = containerGetBlobReference(fileName) blobPropertiesContentType = GetContentType(fileExtension) blobUploadFile(fileFullName) Or if you want to write a string replace the last line with blobUploadText(someString) And make sure you set the content type to the appropriate MIME type (eg ldquotextplainrdquo)

                                53

                                Code ndash BlobHelpercs

                                public string GetBlobText(string blobName) if (client == null || container == null) InitContainer() CloudBlob blob = containerGetBlobReference(blobName) try return blobDownloadText() catch (Exception) The blob probably does not exist or there is no connection available return null

                                54

                                Application Code - Blobs private void SaveToCloudButton_Click(object sender RoutedEventArgs e) StringBuilder buff = new StringBuilder() buffAppendLine(LastNameFirstNameEmailBirthdayNativeLanguageFavoriteIceCreamYearsInPhDGraduated) foreach (AttendeeEntity attendee in attendees) buffAppendLine(attendeeToCsvString()) blobHelperWriteStringToBlob(SummerSchoolAttendeestxt buffToString())

                                The blob is now available at httpltAccountNamegtblobcorewindowsnetltContainerNamegtltBlobNamegt Or in this case httpjjstoreblobcorewindowsnetschoolSummerSchoolAttendeestxt

                                55

                                Code - TableEntities using MicrosoftWindowsAzureStorageClient public class AttendeeEntity TableServiceEntity public string FirstName get set public string LastName get set public string Email get set public DateTime Birthday get set public string FavoriteIceCream get set public int YearsInPhD get set public bool Graduated get set hellip

                                56

                                Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

                                57

                                Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

                                58

                                Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

                                59

                                Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

                                60

                                Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

                                Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

                                61

                                Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

                                62

                                Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

                                Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

                                63

                                Tools ndash Fiddler2

                                Best Practices

                                Picking the Right VM Size

                                bull Having the correct VM size can make a big difference in costs

                                bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                                bull If you scale better than linear across cores larger VMs could save you money

                                bull Pretty rare to see linear scaling across 8 cores

                                bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                                bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                                Using Your VM to the Maximum

                                Remember bull 1 role instance == 1 VM running Windows

                                bull 1 role instance = one specific task for your code

                                bull Yoursquore paying for the entire VM so why not use it

                                bull Common mistake ndash split up code into multiple roles each not using up CPU

                                bull Balance between using up CPU vs having free capacity in times of need

                                bull Multiple ways to use your CPU to the fullest

                                Exploiting Concurrency

                                bull Spin up additional processes each with a specific task or as a unit of concurrency

                                bull May not be ideal if number of active processes exceeds number of cores

                                bull Use multithreading aggressively

                                bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                                bull In NET 4 use the Task Parallel Library

                                bull Data parallelism

                                bull Task parallelism

                                Finding Good Code Neighbors

                                bull Typically code falls into one or more of these categories

                                bull Find code that is intensive with different resources to live together

                                bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                                Memory Intensive

                                CPU Intensive

                                Network IO Intensive

                                Storage IO Intensive

                                Scaling Appropriately

                                bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                                bull Spinning VMs up and down automatically is good at large scale

                                bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                                bull Being too aggressive in spinning down VMs can result in poor user experience

                                bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                                Performance Cost

                                Storage Costs

                                bull Understand an applicationrsquos storage profile and how storage billing works

                                bull Make service choices based on your app profile

                                bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                                bull Service choice can make a big cost difference based on your app profile

                                bull Caching and compressing They help a lot with storage costs

                                Saving Bandwidth Costs

                                Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                Sending fewer things over the wire often means getting fewer things from storage

                                Saving bandwidth costs often lead to savings in other places

                                Sending fewer things means your VM has time to do other tasks

                                All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                Compressing Content

                                1 Gzip all output content

                                bull All modern browsers can decompress on the fly

                                bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                2Tradeoff compute costs for storage size

                                3Minimize image sizes

                                bull Use Portable Network Graphics (PNGs)

                                bull Crush your PNGs

                                bull Strip needless metadata

                                bull Make all PNGs palette PNGs

                                Uncompressed

                                Content

                                Compressed

                                Content

                                Gzip

                                Minify JavaScript

                                Minify CCS

                                Minify Images

                                Best Practices Summary

                                Doing lsquolessrsquo is the key to saving costs

                                Measure everything

                                Know your application profile in and out

                                Research Examples in the Cloud

                                hellipon another set of slides

                                Map Reduce on Azure

                                bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                76

                                Project Daytona - Map Reduce on Azure

                                httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                77

                                Questions and Discussionhellip

                                Thank you for hosting me at the Summer School

                                BLAST (Basic Local Alignment Search Tool)

                                bull The most important software in bioinformatics

                                bull Identify similarity between bio-sequences

                                Computationally intensive

                                bull Large number of pairwise alignment operations

                                bull A BLAST running can take 700 ~ 1000 CPU hours

                                bull Sequence databases growing exponentially

                                bull GenBank doubled in size in about 15 months

                                It is easy to parallelize BLAST

                                bull Segment the input

                                bull Segment processing (querying) is pleasingly parallel

                                bull Segment the database (eg mpiBLAST)

                                bull Needs special result reduction processing

                                Large volume data

                                bull A normal Blast database can be as large as 10GB

                                bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                bull The output of BLAST is usually 10-100x larger than the input

                                bull Parallel BLAST engine on Azure

                                bull Query-segmentation data-parallel pattern bull split the input sequences

                                bull query partitions in parallel

                                bull merge results together when done

                                bull Follows the general suggested application model bull Web Role + Queue + Worker

                                bull With three special considerations bull Batch job management

                                bull Task parallelism on an elastic Cloud

                                Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                A simple SplitJoin pattern

                                Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                bull 1248 for small middle large and extra large instance size

                                Task granularity bull Large partition load imbalance

                                bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                bull Data transferring overhead

                                Best Practice test runs to profiling and set size to mitigate the overhead

                                Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                bull too small repeated computation

                                bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                bull Watch out for the 2-hour maximum limitation

                                BLAST task

                                Splitting task

                                BLAST task

                                BLAST task

                                BLAST task

                                hellip

                                Merging Task

                                Task size vs Performance

                                bull Benefit of the warm cache effect

                                bull 100 sequences per partition is the best choice

                                Instance size vs Performance

                                bull Super-linear speedup with larger size worker instances

                                bull Primarily due to the memory capability

                                Task SizeInstance Size vs Cost

                                bull Extra-large instance generated the best and the most economical throughput

                                bull Fully utilize the resource

                                Web

                                Portal

                                Web

                                Service

                                Job registration

                                Job Scheduler

                                Worker

                                Worker

                                Worker

                                Global

                                dispatch

                                queue

                                Web Role

                                Azure Table

                                Job Management Role

                                Azure Blob

                                Database

                                updating Role

                                hellip

                                Scaling Engine

                                Blast databases

                                temporary data etc)

                                Job Registry NCBI databases

                                BLAST task

                                Splitting task

                                BLAST task

                                BLAST task

                                BLAST task

                                hellip

                                Merging Task

                                ASPNET program hosted by a web role instance bull Submit jobs

                                bull Track jobrsquos status and logs

                                AuthenticationAuthorization based on Live ID

                                The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                Web Portal

                                Web Service

                                Job registration

                                Job Scheduler

                                Job Portal

                                Scaling Engine

                                Job Registry

                                R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                bull Against ~5000 proteins from another strain completed in less than 30 sec

                                AzureBLAST significantly saved computing timehellip

                                Discovering Homologs bull Discover the interrelationships of known protein sequences

                                ldquoAll against Allrdquo query bull The database is also the input query

                                bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                bull Theoretically 100 billion sequence comparisons

                                Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                bull Would require 3216731 minutes (61 years) on one desktop

                                One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                bull Each segment consists of smaller partitions

                                bull When load imbalances redistribute the load manually

                                5

                                0

                                62 6

                                2 6

                                2 6

                                2 6

                                2 5

                                0 62

                                bull Total size of the output result is ~230GB

                                bull The number of total hits is 1764579487

                                bull Started at March 25th the last task completed on April 8th (10 days compute)

                                bull But based our estimates real working instance time should be 6~8 day

                                bull Look into log data to analyze what took placehellip

                                5

                                0

                                62 6

                                2 6

                                2 6

                                2 6

                                2 5

                                0 62

                                A normal log record should be

                                Otherwise something is wrong (eg task failed to complete)

                                3312010 614 RD00155D3611B0 Executing the task 251523

                                3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                3312010 625 RD00155D3611B0 Executing the task 251553

                                3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                3312010 644 RD00155D3611B0 Executing the task 251600

                                3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                3312010 822 RD00155D3611B0 Executing the task 251774

                                3312010 950 RD00155D3611B0 Executing the task 251895

                                3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                North Europe Data Center totally 34256 tasks processed

                                All 62 compute nodes lost tasks

                                and then came back in a group

                                This is an Update domain

                                ~30 mins

                                ~ 6 nodes in one group

                                35 Nodes experience blob

                                writing failure at same

                                time

                                West Europe Datacenter 30976 tasks are completed and job was killed

                                A reasonable guess the

                                Fault Domain is working

                                MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                You never miss the water till the well has run dry Irish Proverb

                                ET = Water volume evapotranspired (m3 s-1 m-2)

                                Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                λv = Latent heat of vaporization (Jg)

                                Rn = Net radiation (W m-2)

                                cp = Specific heat capacity of air (J kg-1 K-1)

                                ρa = dry air density (kg m-3)

                                δq = vapor pressure deficit (Pa)

                                ga = Conductivity of air (inverse of ra) (m s-1)

                                gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                Estimating resistanceconductivity across a

                                catchment can be tricky

                                bull Lots of inputs big data reduction

                                bull Some of the inputs are not so simple

                                119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                Penman-Monteith (1964)

                                Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                NASA MODIS

                                imagery source

                                archives

                                5 TB (600K files)

                                FLUXNET curated

                                sensor dataset

                                (30GB 960 files)

                                FLUXNET curated

                                field dataset

                                2 KB (1 file)

                                NCEPNCAR

                                ~100MB

                                (4K files)

                                Vegetative

                                clumping

                                ~5MB (1file)

                                Climate

                                classification

                                ~1MB (1file)

                                20 US year = 1 global year

                                Data collection (map) stage

                                bull Downloads requested input tiles

                                from NASA ftp sites

                                bull Includes geospatial lookup for

                                non-sinusoidal tiles that will

                                contribute to a reprojected

                                sinusoidal tile

                                Reprojection (map) stage

                                bull Converts source tile(s) to

                                intermediate result sinusoidal tiles

                                bull Simple nearest neighbor or spline

                                algorithms

                                Derivation reduction stage

                                bull First stage visible to scientist

                                bull Computes ET in our initial use

                                Analysis reduction stage

                                bull Optional second stage visible to

                                scientist

                                bull Enables production of science

                                analysis artifacts such as maps

                                tables virtual sensors

                                Reduction 1

                                Queue

                                Source

                                Metadata

                                AzureMODIS

                                Service Web Role Portal

                                Request

                                Queue

                                Scientific

                                Results

                                Download

                                Data Collection Stage

                                Source Imagery Download Sites

                                Reprojection

                                Queue

                                Reduction 2

                                Queue

                                Download

                                Queue

                                Scientists

                                Science results

                                Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                recoverable units of work

                                bull Execution status of all jobs and tasks persisted in Tables

                                ltPipelineStagegt

                                Request

                                hellip ltPipelineStagegtJobStatus

                                Persist ltPipelineStagegtJob Queue

                                MODISAzure Service

                                (Web Role)

                                Service Monitor

                                (Worker Role)

                                Parse amp Persist ltPipelineStagegtTaskStatus

                                hellip

                                Dispatch

                                ltPipelineStagegtTask Queue

                                All work actually done by a Worker Role

                                bull Sandboxes science or other executable

                                bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                Service Monitor

                                (Worker Role)

                                Parse amp Persist ltPipelineStagegtTaskStatus

                                GenericWorker

                                (Worker Role)

                                hellip

                                hellip

                                Dispatch

                                ltPipelineStagegtTask Queue

                                hellip

                                ltInputgtData Storage

                                bull Dequeues tasks created by the Service Monitor

                                bull Retries failed tasks 3 times

                                bull Maintains all task status

                                Reprojection Request

                                hellip

                                Service Monitor

                                (Worker Role)

                                ReprojectionJobStatus Persist

                                Parse amp Persist ReprojectionTaskStatus

                                GenericWorker

                                (Worker Role)

                                hellip

                                Job Queue

                                hellip

                                Dispatch

                                Task Queue

                                Points to

                                hellip

                                ScanTimeList

                                SwathGranuleMeta

                                Reprojection Data

                                Storage

                                Each entity specifies a

                                single reprojection job

                                request

                                Each entity specifies a

                                single reprojection task (ie

                                a single tile)

                                Query this table to get

                                geo-metadata (eg

                                boundaries) for each swath

                                tile

                                Query this table to get the

                                list of satellite scan times

                                that cover a target tile

                                Swath Source

                                Data Storage

                                bull Computational costs driven by data scale and need to run reduction multiple times

                                bull Storage costs driven by data scale and 6 month project duration

                                bull Small with respect to the people costs even at graduate student rates

                                Reduction 1 Queue

                                Source Metadata

                                Request Queue

                                Scientific Results Download

                                Data Collection Stage

                                Source Imagery Download Sites

                                Reprojection Queue

                                Reduction 2 Queue

                                Download

                                Queue

                                Scientists

                                Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                400-500 GB

                                60K files

                                10 MBsec

                                11 hours

                                lt10 workers

                                $50 upload

                                $450 storage

                                400 GB

                                45K files

                                3500 hours

                                20-100

                                workers

                                5-7 GB

                                55K files

                                1800 hours

                                20-100

                                workers

                                lt10 GB

                                ~1K files

                                1800 hours

                                20-100

                                workers

                                $420 cpu

                                $60 download

                                $216 cpu

                                $1 download

                                $6 storage

                                $216 cpu

                                $2 download

                                $9 storage

                                AzureMODIS

                                Service Web Role Portal

                                Total $1420

                                bull Clouds are the largest scale computer centers ever constructed and have the

                                potential to be important to both large and small scale science problems

                                bull Equally import they can increase participation in research providing needed

                                resources to userscommunities without ready access

                                bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                latency applications do not perform optimally on clouds today

                                bull Provide valuable fault tolerance and scalability abstractions

                                bull Clouds as amplifier for familiar client tools and on premise compute

                                bull Clouds services to support research provide considerable leverage for both

                                individual researchers and entire communities of researchers

                                • Day 2 - Azure as a PaaS
                                • Day 2 - Applications

                                  Application Hosting

                                  lsquoGrokkingrsquo the service model

                                  bull Imagine white-boarding out your service architecture with boxes for nodes and arrows describing how they communicate

                                  bull The service model is the same diagram written down in a declarative format

                                  bull You give the Fabric the service model and the binaries that go with each of those nodes

                                  bull The Fabric can provision deploy and manage that diagram for you

                                  bull Find hardware home

                                  bull Copy and launch your app binaries

                                  bull Monitor your app and the hardware

                                  bull In case of failure take action Perhaps even relocate your app

                                  bull At all times the lsquodiagramrsquo stays whole

                                  Automated Service Management Provide code + service model

                                  bull Platform identifies and allocates resources deploys the service manages service health

                                  bull Configuration is handled by two files

                                  ServiceDefinitioncsdef

                                  ServiceConfigurationcscfg

                                  Service Definition

                                  Service Configuration

                                  GUI

                                  Double click on Role Name in Azure Project

                                  Deploying to the cloud

                                  bull We can deploy from the portal or from script

                                  bull VS builds two files

                                  bull Encrypted package of your code

                                  bull Your config file

                                  bull You must create an Azure account then a service and then you deploy your code

                                  bull Can take up to 20 minutes

                                  bull (which is better than six months)

                                  Service Management API

                                  bullREST based API to manage your services

                                  bullX509-certs for authentication

                                  bullLets you create delete change upgrade swaphellip

                                  bullLots of community and MSFT-built tools around the API

                                  - Easy to roll your own

                                  The Secret Sauce ndash The Fabric

                                  The Fabric is the lsquobrainrsquo behind Windows Azure

                                  1 Process service model

                                  1 Determine resource requirements

                                  2 Create role images

                                  2 Allocate resources

                                  3 Prepare nodes

                                  1 Place role images on nodes

                                  2 Configure settings

                                  3 Start roles

                                  4 Configure load balancers

                                  5 Maintain service health

                                  1 If role fails restart the role based on policy

                                  2 If node fails migrate the role based on policy

                                  Storage

                                  Durable Storage At Massive Scale

                                  Blob

                                  - Massive files eg videos logs

                                  Drive

                                  - Use standard file system APIs

                                  Tables - Non-relational but with few scale limits

                                  - Use SQL Azure for relational data

                                  Queues

                                  - Facilitate loosely-coupled reliable systems

                                  Blob Features and Functions

                                  bull Store Large Objects (up to 1TB in size)

                                  bull Can be served through Windows Azure CDN service

                                  bull Standard REST Interface

                                  bull PutBlob

                                  bull Inserts a new blob overwrites the existing blob

                                  bull GetBlob

                                  bull Get whole blob or a specific range

                                  bull DeleteBlob

                                  bull CopyBlob

                                  bull SnapshotBlob

                                  bull LeaseBlob

                                  Two Types of Blobs Under the Hood

                                  bull Block Blob

                                  bull Targeted at streaming workloads

                                  bull Each blob consists of a sequence of blocks bull Each block is identified by a Block ID

                                  bull Size limit 200GB per blob

                                  bull Page Blob

                                  bull Targeted at random readwrite workloads

                                  bull Each blob consists of an array of pages bull Each page is identified by its offset

                                  from the start of the blob

                                  bull Size limit 1TB per blob

                                  Windows Azure Drive

                                  bull Provides a durable NTFS volume for Windows Azure applications to use

                                  bull Use existing NTFS APIs to access a durable drive bull Durability and survival of data on application failover

                                  bull Enables migrating existing NTFS applications to the cloud

                                  bull A Windows Azure Drive is a Page Blob

                                  bull Example mount Page Blob as X bull httpltaccountnamegtblobcorewindowsnetltcontainernamegtltblobnamegt

                                  bull All writes to drive are made durable to the Page Blob bull Drive made durable through standard Page Blob replication

                                  bull Drive persists even when not mounted as a Page Blob

                                  Windows Azure Tables

                                  bull Provides Structured Storage

                                  bull Massively Scalable Tables bull Billions of entities (rows) and TBs of data

                                  bull Can use thousands of servers as traffic grows

                                  bull Highly Available amp Durable bull Data is replicated several times

                                  bull Familiar and Easy to use API

                                  bull WCF Data Services and OData bull NET classes and LINQ

                                  bull REST ndash with any platform or language

                                  Windows Azure Queues

                                  bull Queue are performance efficient highly available and provide reliable message delivery

                                  bull Simple asynchronous work dispatch

                                  bull Programming semantics ensure that a message can be processed at least once

                                  bull Access is provided via REST

                                  Storage Partitioning

                                  Understanding partitioning is key to understanding performance

                                  bull Different for each data type (blobs entities queues) Every data object has a partition key

                                  bull A partition can be served by a single server

                                  bull System load balances partitions based on traffic pattern

                                  bull Controls entity locality Partition key is unit of scale

                                  bull Load balancing can take a few minutes to kick in

                                  bull Can take a couple of seconds for partition to be available on a different server

                                  System load balances

                                  bull Use exponential backoff on ldquoServer Busyrdquo

                                  bull Our system load balances to meet your traffic needs

                                  bull Single partition limits have been reached Server Busy

                                  Partition Keys In Each Abstraction

                                  bull Entities w same PartitionKey value served from same partition Entities ndash TableName + PartitionKey

                                  PartitionKey (CustomerId) RowKey (RowKind)

                                  Name CreditCardNumber OrderTotal

                                  1 Customer-John Smith John Smith xxxx-xxxx-xxxx-xxxx

                                  1 Order ndash 1 $3512

                                  2 Customer-Bill Johnson Bill Johnson xxxx-xxxx-xxxx-xxxx

                                  2 Order ndash 3 $1000

                                  bull Every blob and its snapshots are in a single partition Blobs ndash Container name + Blob name

                                  bullAll messages for a single queue belong to the same partition Messages ndash Queue Name

                                  Container Name Blob Name

                                  image annarborbighousejpg

                                  image foxboroughgillettejpg

                                  video annarborbighousejpg

                                  Queue Message

                                  jobs Message1

                                  jobs Message2

                                  workflow Message1

                                  Scalability Targets

                                  Storage Account

                                  bull Capacity ndash Up to 100 TBs

                                  bull Transactions ndash Up to a few thousand requests per second

                                  bull Bandwidth ndash Up to a few hundred megabytes per second

                                  Single QueueTable Partition

                                  bull Up to 500 transactions per second

                                  To go above these numbers partition between multiple storage accounts and partitions

                                  When limit is hit app will see lsquo503 server busyrsquo applications should implement exponential backoff

                                  Single Blob Partition

                                  bull Throughput up to 60 MBs

                                  PartitionKey (Category)

                                  RowKey (Title)

                                  Timestamp ReleaseDate

                                  Action Fast amp Furious hellip 2009

                                  Action The Bourne Ultimatum hellip 2007

                                  hellip hellip hellip hellip

                                  Animation Open Season 2 hellip 2009

                                  Animation The Ant Bully hellip 2006

                                  PartitionKey (Category)

                                  RowKey (Title)

                                  Timestamp ReleaseDate

                                  Comedy Office Space hellip 1999

                                  hellip hellip hellip hellip

                                  SciFi X-Men Origins Wolverine hellip 2009

                                  hellip hellip hellip hellip

                                  War Defiance hellip 2008

                                  PartitionKey (Category)

                                  RowKey (Title)

                                  Timestamp ReleaseDate

                                  Action Fast amp Furious hellip 2009

                                  Action The Bourne Ultimatum hellip 2007

                                  hellip hellip hellip hellip

                                  Animation Open Season 2 hellip 2009

                                  Animation The Ant Bully hellip 2006

                                  hellip hellip hellip hellip

                                  Comedy Office Space hellip 1999

                                  hellip hellip hellip hellip

                                  SciFi X-Men Origins Wolverine hellip 2009

                                  hellip hellip hellip hellip

                                  War Defiance hellip 2008

                                  Partitions and Partition Ranges

                                  Key Selection Things to Consider

                                  bullDistribute load as much as possible bullHot partitions can be load balanced bullPartitionKey is critical for scalability

                                  See httpwwwmicrosoftpdccom2009SVC09 and httpazurescopecloudappnet for more information

                                  bull Avoid frequent large scans bull Parallelize queries bull Point queries are most efficient

                                  bullTransactions across a single partition bullTransaction semantics amp Reduce round trips

                                  Scalability

                                  Query Efficiency amp Speed

                                  Entity group transactions

                                  Expect Continuation Tokens ndash Seriously

                                  Maximum of 1000 rows in a response

                                  At the end of partition range boundary

                                  Maximum of 1000 rows in a response

                                  At the end of partition range boundary

                                  Maximum of 5 seconds to execute the query

                                  Tables Recap bullEfficient for frequently used queries

                                  bullSupports batch transactions

                                  bullDistributes load

                                  Select PartitionKey and RowKey that help scale

                                  Avoid ldquoAppend onlyrdquo patterns

                                  Always Handle continuation tokens

                                  ldquoORrdquo predicates are not optimized

                                  Implement back-off strategy for retries

                                  bullDistribute by using a hash etc as prefix

                                  bullExpect continuation tokens for range queries

                                  bullExecute the queries that form the ldquoORrdquo predicates as separate queries

                                  bullServer busy

                                  bullLoad balance partitions to meet traffic needs

                                  bullLoad on single partition has exceeded the limits

                                  WCF Data Services

                                  bullUse a new context for each logical operation

                                  bullAddObjectAttachTo can throw exception if entity is already being tracked

                                  bullPoint query throws an exception if resource does not exist Use IgnoreResourceNotFoundException

                                  Queues Their Unique Role in Building Reliable Scalable Applications

                                  bull Want roles that work closely together but are not bound together bull Tight coupling leads to brittleness

                                  bull This can aid in scaling and performance

                                  bull A queue can hold an unlimited number of messages bull Messages must be serializable as XML

                                  bull Limited to 8KB in size

                                  bull Commonly use the work ticket pattern

                                  bull Why not simply use a table

                                  Queue Terminology

                                  Message Lifecycle

                                  Queue

                                  Msg 1

                                  Msg 2

                                  Msg 3

                                  Msg 4

                                  Worker Role

                                  Worker Role

                                  PutMessage

                                  Web Role

                                  GetMessage (Timeout) RemoveMessage

                                  Msg 2 Msg 1

                                  Worker Role

                                  Msg 2

                                  POST httpmyaccountqueuecorewindowsnetmyqueuemessages

                                  HTTP11 200 OK Transfer-Encoding chunked Content-Type applicationxml Date Tue 09 Dec 2008 210430 GMT Server Nephos Queue Service Version 10 Microsoft-HTTPAPI20

                                  ltxml version=10 encoding=utf-8gt ltQueueMessagesListgt ltQueueMessagegt ltMessageIdgt5974b586-0df3-4e2d-ad0c-18e3892bfca2ltMessageIdgt ltInsertionTimegtMon 22 Sep 2008 232920 GMTltInsertionTimegt ltExpirationTimegtMon 29 Sep 2008 232920 GMTltExpirationTimegt ltPopReceiptgtYzQ4Yzg1MDIGM0MDFiZDAwYzEwltPopReceiptgt ltTimeNextVisiblegtTue 23 Sep 2008 052920GMTltTimeNextVisiblegt ltMessageTextgtPHRlc3Q+dGdGVzdD4=ltMessageTextgt ltQueueMessagegt ltQueueMessagesListgt

                                  DELETE httpmyaccountqueuecorewindowsnetmyqueuemessagesmessageidpopreceipt=YzQ4Yzg1MDIGM0MDFiZDAwYzEw

                                  Truncated Exponential Back Off Polling

                                  Consider a backoff polling approach Each empty poll

                                  increases interval by 2x

                                  A successful sets the interval back to 1

                                  44

                                  2 1

                                  1 1

                                  C1

                                  C2

                                  Removing Poison Messages

                                  1 1

                                  2 1

                                  3 4 0

                                  Producers Consumers

                                  P2

                                  P1

                                  3 0

                                  2 GetMessage(Q 30 s) msg 2

                                  1 GetMessage(Q 30 s) msg 1

                                  1 1

                                  2 1

                                  1 0

                                  2 0

                                  45

                                  C1

                                  C2

                                  Removing Poison Messages

                                  3 4 0

                                  Producers Consumers

                                  P2

                                  P1

                                  1 1

                                  2 1

                                  2 GetMessage(Q 30 s) msg 2 3 C2 consumed msg 2 4 DeleteMessage(Q msg 2) 7 GetMessage(Q 30 s) msg 1

                                  1 GetMessage(Q 30 s) msg 1 5 C1 crashed

                                  1 1

                                  2 1

                                  6 msg1 visible 30 s after Dequeue 3 0

                                  1 2

                                  1 1

                                  1 2

                                  46

                                  C1

                                  C2

                                  Removing Poison Messages

                                  3 4 0

                                  Producers Consumers

                                  P2

                                  P1

                                  1 2

                                  2 Dequeue(Q 30 sec) msg 2 3 C2 consumed msg 2 4 Delete(Q msg 2) 7 Dequeue(Q 30 sec) msg 1 8 C2 crashed

                                  1 Dequeue(Q 30 sec) msg 1 5 C1 crashed 10 C1 restarted 11 Dequeue(Q 30 sec) msg 1 12 DequeueCount gt 2 13 Delete (Q msg1) 1

                                  2

                                  6 msg1 visible 30s after Dequeue 9 msg1 visible 30s after Dequeue

                                  3 0

                                  1 3

                                  1 2

                                  1 3

                                  Queues Recap

                                  bullNo need to deal with failures Make message

                                  processing idempotent

                                  bullInvisible messages result in out of order Do not rely on order

                                  bullEnforce threshold on messagersquos dequeue count Use Dequeue count to remove

                                  poison messages

                                  bullMessages gt 8KB

                                  bullBatch messages

                                  bullGarbage collect orphaned blobs

                                  bullDynamically increasereduce workers

                                  Use blob to store message data with

                                  reference in message

                                  Use message count to scale

                                  bullNo need to deal with failures

                                  bullInvisible messages result in out of order

                                  bullEnforce threshold on messagersquos dequeue count

                                  bullDynamically increasereduce workers

                                  Windows Azure Storage Takeaways

                                  Blobs

                                  Drives

                                  Tables

                                  Queues

                                  httpblogsmsdncomwindowsazurestorage

                                  httpazurescopecloudappnet

                                  49

                                  A Quick Exercise

                                  hellipThen letrsquos look at some code and some tools

                                  50

                                  Code ndash AccountInformationcs public class AccountInformation private static string storageKey = ldquotHiSiSnOtMyKeY private static string accountName = jjstore private static StorageCredentialsAccountAndKey credentials internal static StorageCredentialsAccountAndKey Credentials get if (credentials == null) credentials = new StorageCredentialsAccountAndKey(accountName storageKey) return credentials

                                  51

                                  Code ndash BlobHelpercs public class BlobHelper private static string defaultContainerName = school private CloudBlobClient client = null private CloudBlobContainer container = null private void InitContainer() if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudBlobClient() container = clientGetContainerReference(defaultContainerName) containerCreateIfNotExist() BlobContainerPermissions permissions = containerGetPermissions() permissionsPublicAccess = BlobContainerPublicAccessTypeContainer containerSetPermissions(permissions)

                                  52

                                  Code ndash BlobHelpercs

                                  public void WriteFileToBlob(string filePath) if (client == null || container == null) InitContainer() FileInfo file = new FileInfo(filePath) CloudBlob blob = containerGetBlobReference(fileName) blobPropertiesContentType = GetContentType(fileExtension) blobUploadFile(fileFullName) Or if you want to write a string replace the last line with blobUploadText(someString) And make sure you set the content type to the appropriate MIME type (eg ldquotextplainrdquo)

                                  53

                                  Code ndash BlobHelpercs

                                  public string GetBlobText(string blobName) if (client == null || container == null) InitContainer() CloudBlob blob = containerGetBlobReference(blobName) try return blobDownloadText() catch (Exception) The blob probably does not exist or there is no connection available return null

                                  54

                                  Application Code - Blobs private void SaveToCloudButton_Click(object sender RoutedEventArgs e) StringBuilder buff = new StringBuilder() buffAppendLine(LastNameFirstNameEmailBirthdayNativeLanguageFavoriteIceCreamYearsInPhDGraduated) foreach (AttendeeEntity attendee in attendees) buffAppendLine(attendeeToCsvString()) blobHelperWriteStringToBlob(SummerSchoolAttendeestxt buffToString())

                                  The blob is now available at httpltAccountNamegtblobcorewindowsnetltContainerNamegtltBlobNamegt Or in this case httpjjstoreblobcorewindowsnetschoolSummerSchoolAttendeestxt

                                  55

                                  Code - TableEntities using MicrosoftWindowsAzureStorageClient public class AttendeeEntity TableServiceEntity public string FirstName get set public string LastName get set public string Email get set public DateTime Birthday get set public string FavoriteIceCream get set public int YearsInPhD get set public bool Graduated get set hellip

                                  56

                                  Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

                                  57

                                  Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

                                  58

                                  Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

                                  59

                                  Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

                                  60

                                  Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

                                  Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

                                  61

                                  Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

                                  62

                                  Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

                                  Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

                                  63

                                  Tools ndash Fiddler2

                                  Best Practices

                                  Picking the Right VM Size

                                  bull Having the correct VM size can make a big difference in costs

                                  bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                                  bull If you scale better than linear across cores larger VMs could save you money

                                  bull Pretty rare to see linear scaling across 8 cores

                                  bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                                  bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                                  Using Your VM to the Maximum

                                  Remember bull 1 role instance == 1 VM running Windows

                                  bull 1 role instance = one specific task for your code

                                  bull Yoursquore paying for the entire VM so why not use it

                                  bull Common mistake ndash split up code into multiple roles each not using up CPU

                                  bull Balance between using up CPU vs having free capacity in times of need

                                  bull Multiple ways to use your CPU to the fullest

                                  Exploiting Concurrency

                                  bull Spin up additional processes each with a specific task or as a unit of concurrency

                                  bull May not be ideal if number of active processes exceeds number of cores

                                  bull Use multithreading aggressively

                                  bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                                  bull In NET 4 use the Task Parallel Library

                                  bull Data parallelism

                                  bull Task parallelism

                                  Finding Good Code Neighbors

                                  bull Typically code falls into one or more of these categories

                                  bull Find code that is intensive with different resources to live together

                                  bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                                  Memory Intensive

                                  CPU Intensive

                                  Network IO Intensive

                                  Storage IO Intensive

                                  Scaling Appropriately

                                  bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                                  bull Spinning VMs up and down automatically is good at large scale

                                  bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                                  bull Being too aggressive in spinning down VMs can result in poor user experience

                                  bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                                  Performance Cost

                                  Storage Costs

                                  bull Understand an applicationrsquos storage profile and how storage billing works

                                  bull Make service choices based on your app profile

                                  bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                                  bull Service choice can make a big cost difference based on your app profile

                                  bull Caching and compressing They help a lot with storage costs

                                  Saving Bandwidth Costs

                                  Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                  Sending fewer things over the wire often means getting fewer things from storage

                                  Saving bandwidth costs often lead to savings in other places

                                  Sending fewer things means your VM has time to do other tasks

                                  All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                  Compressing Content

                                  1 Gzip all output content

                                  bull All modern browsers can decompress on the fly

                                  bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                  2Tradeoff compute costs for storage size

                                  3Minimize image sizes

                                  bull Use Portable Network Graphics (PNGs)

                                  bull Crush your PNGs

                                  bull Strip needless metadata

                                  bull Make all PNGs palette PNGs

                                  Uncompressed

                                  Content

                                  Compressed

                                  Content

                                  Gzip

                                  Minify JavaScript

                                  Minify CCS

                                  Minify Images

                                  Best Practices Summary

                                  Doing lsquolessrsquo is the key to saving costs

                                  Measure everything

                                  Know your application profile in and out

                                  Research Examples in the Cloud

                                  hellipon another set of slides

                                  Map Reduce on Azure

                                  bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                  bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                  76

                                  Project Daytona - Map Reduce on Azure

                                  httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                  77

                                  Questions and Discussionhellip

                                  Thank you for hosting me at the Summer School

                                  BLAST (Basic Local Alignment Search Tool)

                                  bull The most important software in bioinformatics

                                  bull Identify similarity between bio-sequences

                                  Computationally intensive

                                  bull Large number of pairwise alignment operations

                                  bull A BLAST running can take 700 ~ 1000 CPU hours

                                  bull Sequence databases growing exponentially

                                  bull GenBank doubled in size in about 15 months

                                  It is easy to parallelize BLAST

                                  bull Segment the input

                                  bull Segment processing (querying) is pleasingly parallel

                                  bull Segment the database (eg mpiBLAST)

                                  bull Needs special result reduction processing

                                  Large volume data

                                  bull A normal Blast database can be as large as 10GB

                                  bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                  bull The output of BLAST is usually 10-100x larger than the input

                                  bull Parallel BLAST engine on Azure

                                  bull Query-segmentation data-parallel pattern bull split the input sequences

                                  bull query partitions in parallel

                                  bull merge results together when done

                                  bull Follows the general suggested application model bull Web Role + Queue + Worker

                                  bull With three special considerations bull Batch job management

                                  bull Task parallelism on an elastic Cloud

                                  Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                  on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                  A simple SplitJoin pattern

                                  Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                  bull 1248 for small middle large and extra large instance size

                                  Task granularity bull Large partition load imbalance

                                  bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                  bull Data transferring overhead

                                  Best Practice test runs to profiling and set size to mitigate the overhead

                                  Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                  bull too small repeated computation

                                  bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                  bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                  bull Watch out for the 2-hour maximum limitation

                                  BLAST task

                                  Splitting task

                                  BLAST task

                                  BLAST task

                                  BLAST task

                                  hellip

                                  Merging Task

                                  Task size vs Performance

                                  bull Benefit of the warm cache effect

                                  bull 100 sequences per partition is the best choice

                                  Instance size vs Performance

                                  bull Super-linear speedup with larger size worker instances

                                  bull Primarily due to the memory capability

                                  Task SizeInstance Size vs Cost

                                  bull Extra-large instance generated the best and the most economical throughput

                                  bull Fully utilize the resource

                                  Web

                                  Portal

                                  Web

                                  Service

                                  Job registration

                                  Job Scheduler

                                  Worker

                                  Worker

                                  Worker

                                  Global

                                  dispatch

                                  queue

                                  Web Role

                                  Azure Table

                                  Job Management Role

                                  Azure Blob

                                  Database

                                  updating Role

                                  hellip

                                  Scaling Engine

                                  Blast databases

                                  temporary data etc)

                                  Job Registry NCBI databases

                                  BLAST task

                                  Splitting task

                                  BLAST task

                                  BLAST task

                                  BLAST task

                                  hellip

                                  Merging Task

                                  ASPNET program hosted by a web role instance bull Submit jobs

                                  bull Track jobrsquos status and logs

                                  AuthenticationAuthorization based on Live ID

                                  The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                  Web Portal

                                  Web Service

                                  Job registration

                                  Job Scheduler

                                  Job Portal

                                  Scaling Engine

                                  Job Registry

                                  R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                  Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                  bull Against ~5000 proteins from another strain completed in less than 30 sec

                                  AzureBLAST significantly saved computing timehellip

                                  Discovering Homologs bull Discover the interrelationships of known protein sequences

                                  ldquoAll against Allrdquo query bull The database is also the input query

                                  bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                  bull Theoretically 100 billion sequence comparisons

                                  Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                  bull Would require 3216731 minutes (61 years) on one desktop

                                  One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                  bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                  bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                  bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                  bull Each segment consists of smaller partitions

                                  bull When load imbalances redistribute the load manually

                                  5

                                  0

                                  62 6

                                  2 6

                                  2 6

                                  2 6

                                  2 5

                                  0 62

                                  bull Total size of the output result is ~230GB

                                  bull The number of total hits is 1764579487

                                  bull Started at March 25th the last task completed on April 8th (10 days compute)

                                  bull But based our estimates real working instance time should be 6~8 day

                                  bull Look into log data to analyze what took placehellip

                                  5

                                  0

                                  62 6

                                  2 6

                                  2 6

                                  2 6

                                  2 5

                                  0 62

                                  A normal log record should be

                                  Otherwise something is wrong (eg task failed to complete)

                                  3312010 614 RD00155D3611B0 Executing the task 251523

                                  3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                  3312010 625 RD00155D3611B0 Executing the task 251553

                                  3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                  3312010 644 RD00155D3611B0 Executing the task 251600

                                  3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                  3312010 822 RD00155D3611B0 Executing the task 251774

                                  3312010 950 RD00155D3611B0 Executing the task 251895

                                  3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                  North Europe Data Center totally 34256 tasks processed

                                  All 62 compute nodes lost tasks

                                  and then came back in a group

                                  This is an Update domain

                                  ~30 mins

                                  ~ 6 nodes in one group

                                  35 Nodes experience blob

                                  writing failure at same

                                  time

                                  West Europe Datacenter 30976 tasks are completed and job was killed

                                  A reasonable guess the

                                  Fault Domain is working

                                  MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                  You never miss the water till the well has run dry Irish Proverb

                                  ET = Water volume evapotranspired (m3 s-1 m-2)

                                  Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                  λv = Latent heat of vaporization (Jg)

                                  Rn = Net radiation (W m-2)

                                  cp = Specific heat capacity of air (J kg-1 K-1)

                                  ρa = dry air density (kg m-3)

                                  δq = vapor pressure deficit (Pa)

                                  ga = Conductivity of air (inverse of ra) (m s-1)

                                  gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                  γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                  Estimating resistanceconductivity across a

                                  catchment can be tricky

                                  bull Lots of inputs big data reduction

                                  bull Some of the inputs are not so simple

                                  119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                  (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                  Penman-Monteith (1964)

                                  Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                  NASA MODIS

                                  imagery source

                                  archives

                                  5 TB (600K files)

                                  FLUXNET curated

                                  sensor dataset

                                  (30GB 960 files)

                                  FLUXNET curated

                                  field dataset

                                  2 KB (1 file)

                                  NCEPNCAR

                                  ~100MB

                                  (4K files)

                                  Vegetative

                                  clumping

                                  ~5MB (1file)

                                  Climate

                                  classification

                                  ~1MB (1file)

                                  20 US year = 1 global year

                                  Data collection (map) stage

                                  bull Downloads requested input tiles

                                  from NASA ftp sites

                                  bull Includes geospatial lookup for

                                  non-sinusoidal tiles that will

                                  contribute to a reprojected

                                  sinusoidal tile

                                  Reprojection (map) stage

                                  bull Converts source tile(s) to

                                  intermediate result sinusoidal tiles

                                  bull Simple nearest neighbor or spline

                                  algorithms

                                  Derivation reduction stage

                                  bull First stage visible to scientist

                                  bull Computes ET in our initial use

                                  Analysis reduction stage

                                  bull Optional second stage visible to

                                  scientist

                                  bull Enables production of science

                                  analysis artifacts such as maps

                                  tables virtual sensors

                                  Reduction 1

                                  Queue

                                  Source

                                  Metadata

                                  AzureMODIS

                                  Service Web Role Portal

                                  Request

                                  Queue

                                  Scientific

                                  Results

                                  Download

                                  Data Collection Stage

                                  Source Imagery Download Sites

                                  Reprojection

                                  Queue

                                  Reduction 2

                                  Queue

                                  Download

                                  Queue

                                  Scientists

                                  Science results

                                  Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                  httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                  bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                  bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                  bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                  recoverable units of work

                                  bull Execution status of all jobs and tasks persisted in Tables

                                  ltPipelineStagegt

                                  Request

                                  hellip ltPipelineStagegtJobStatus

                                  Persist ltPipelineStagegtJob Queue

                                  MODISAzure Service

                                  (Web Role)

                                  Service Monitor

                                  (Worker Role)

                                  Parse amp Persist ltPipelineStagegtTaskStatus

                                  hellip

                                  Dispatch

                                  ltPipelineStagegtTask Queue

                                  All work actually done by a Worker Role

                                  bull Sandboxes science or other executable

                                  bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                  Service Monitor

                                  (Worker Role)

                                  Parse amp Persist ltPipelineStagegtTaskStatus

                                  GenericWorker

                                  (Worker Role)

                                  hellip

                                  hellip

                                  Dispatch

                                  ltPipelineStagegtTask Queue

                                  hellip

                                  ltInputgtData Storage

                                  bull Dequeues tasks created by the Service Monitor

                                  bull Retries failed tasks 3 times

                                  bull Maintains all task status

                                  Reprojection Request

                                  hellip

                                  Service Monitor

                                  (Worker Role)

                                  ReprojectionJobStatus Persist

                                  Parse amp Persist ReprojectionTaskStatus

                                  GenericWorker

                                  (Worker Role)

                                  hellip

                                  Job Queue

                                  hellip

                                  Dispatch

                                  Task Queue

                                  Points to

                                  hellip

                                  ScanTimeList

                                  SwathGranuleMeta

                                  Reprojection Data

                                  Storage

                                  Each entity specifies a

                                  single reprojection job

                                  request

                                  Each entity specifies a

                                  single reprojection task (ie

                                  a single tile)

                                  Query this table to get

                                  geo-metadata (eg

                                  boundaries) for each swath

                                  tile

                                  Query this table to get the

                                  list of satellite scan times

                                  that cover a target tile

                                  Swath Source

                                  Data Storage

                                  bull Computational costs driven by data scale and need to run reduction multiple times

                                  bull Storage costs driven by data scale and 6 month project duration

                                  bull Small with respect to the people costs even at graduate student rates

                                  Reduction 1 Queue

                                  Source Metadata

                                  Request Queue

                                  Scientific Results Download

                                  Data Collection Stage

                                  Source Imagery Download Sites

                                  Reprojection Queue

                                  Reduction 2 Queue

                                  Download

                                  Queue

                                  Scientists

                                  Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                  400-500 GB

                                  60K files

                                  10 MBsec

                                  11 hours

                                  lt10 workers

                                  $50 upload

                                  $450 storage

                                  400 GB

                                  45K files

                                  3500 hours

                                  20-100

                                  workers

                                  5-7 GB

                                  55K files

                                  1800 hours

                                  20-100

                                  workers

                                  lt10 GB

                                  ~1K files

                                  1800 hours

                                  20-100

                                  workers

                                  $420 cpu

                                  $60 download

                                  $216 cpu

                                  $1 download

                                  $6 storage

                                  $216 cpu

                                  $2 download

                                  $9 storage

                                  AzureMODIS

                                  Service Web Role Portal

                                  Total $1420

                                  bull Clouds are the largest scale computer centers ever constructed and have the

                                  potential to be important to both large and small scale science problems

                                  bull Equally import they can increase participation in research providing needed

                                  resources to userscommunities without ready access

                                  bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                  support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                  latency applications do not perform optimally on clouds today

                                  bull Provide valuable fault tolerance and scalability abstractions

                                  bull Clouds as amplifier for familiar client tools and on premise compute

                                  bull Clouds services to support research provide considerable leverage for both

                                  individual researchers and entire communities of researchers

                                  • Day 2 - Azure as a PaaS
                                  • Day 2 - Applications

                                    lsquoGrokkingrsquo the service model

                                    bull Imagine white-boarding out your service architecture with boxes for nodes and arrows describing how they communicate

                                    bull The service model is the same diagram written down in a declarative format

                                    bull You give the Fabric the service model and the binaries that go with each of those nodes

                                    bull The Fabric can provision deploy and manage that diagram for you

                                    bull Find hardware home

                                    bull Copy and launch your app binaries

                                    bull Monitor your app and the hardware

                                    bull In case of failure take action Perhaps even relocate your app

                                    bull At all times the lsquodiagramrsquo stays whole

                                    Automated Service Management Provide code + service model

                                    bull Platform identifies and allocates resources deploys the service manages service health

                                    bull Configuration is handled by two files

                                    ServiceDefinitioncsdef

                                    ServiceConfigurationcscfg

                                    Service Definition

                                    Service Configuration

                                    GUI

                                    Double click on Role Name in Azure Project

                                    Deploying to the cloud

                                    bull We can deploy from the portal or from script

                                    bull VS builds two files

                                    bull Encrypted package of your code

                                    bull Your config file

                                    bull You must create an Azure account then a service and then you deploy your code

                                    bull Can take up to 20 minutes

                                    bull (which is better than six months)

                                    Service Management API

                                    bullREST based API to manage your services

                                    bullX509-certs for authentication

                                    bullLets you create delete change upgrade swaphellip

                                    bullLots of community and MSFT-built tools around the API

                                    - Easy to roll your own

                                    The Secret Sauce ndash The Fabric

                                    The Fabric is the lsquobrainrsquo behind Windows Azure

                                    1 Process service model

                                    1 Determine resource requirements

                                    2 Create role images

                                    2 Allocate resources

                                    3 Prepare nodes

                                    1 Place role images on nodes

                                    2 Configure settings

                                    3 Start roles

                                    4 Configure load balancers

                                    5 Maintain service health

                                    1 If role fails restart the role based on policy

                                    2 If node fails migrate the role based on policy

                                    Storage

                                    Durable Storage At Massive Scale

                                    Blob

                                    - Massive files eg videos logs

                                    Drive

                                    - Use standard file system APIs

                                    Tables - Non-relational but with few scale limits

                                    - Use SQL Azure for relational data

                                    Queues

                                    - Facilitate loosely-coupled reliable systems

                                    Blob Features and Functions

                                    bull Store Large Objects (up to 1TB in size)

                                    bull Can be served through Windows Azure CDN service

                                    bull Standard REST Interface

                                    bull PutBlob

                                    bull Inserts a new blob overwrites the existing blob

                                    bull GetBlob

                                    bull Get whole blob or a specific range

                                    bull DeleteBlob

                                    bull CopyBlob

                                    bull SnapshotBlob

                                    bull LeaseBlob

                                    Two Types of Blobs Under the Hood

                                    bull Block Blob

                                    bull Targeted at streaming workloads

                                    bull Each blob consists of a sequence of blocks bull Each block is identified by a Block ID

                                    bull Size limit 200GB per blob

                                    bull Page Blob

                                    bull Targeted at random readwrite workloads

                                    bull Each blob consists of an array of pages bull Each page is identified by its offset

                                    from the start of the blob

                                    bull Size limit 1TB per blob

                                    Windows Azure Drive

                                    bull Provides a durable NTFS volume for Windows Azure applications to use

                                    bull Use existing NTFS APIs to access a durable drive bull Durability and survival of data on application failover

                                    bull Enables migrating existing NTFS applications to the cloud

                                    bull A Windows Azure Drive is a Page Blob

                                    bull Example mount Page Blob as X bull httpltaccountnamegtblobcorewindowsnetltcontainernamegtltblobnamegt

                                    bull All writes to drive are made durable to the Page Blob bull Drive made durable through standard Page Blob replication

                                    bull Drive persists even when not mounted as a Page Blob

                                    Windows Azure Tables

                                    bull Provides Structured Storage

                                    bull Massively Scalable Tables bull Billions of entities (rows) and TBs of data

                                    bull Can use thousands of servers as traffic grows

                                    bull Highly Available amp Durable bull Data is replicated several times

                                    bull Familiar and Easy to use API

                                    bull WCF Data Services and OData bull NET classes and LINQ

                                    bull REST ndash with any platform or language

                                    Windows Azure Queues

                                    bull Queue are performance efficient highly available and provide reliable message delivery

                                    bull Simple asynchronous work dispatch

                                    bull Programming semantics ensure that a message can be processed at least once

                                    bull Access is provided via REST

                                    Storage Partitioning

                                    Understanding partitioning is key to understanding performance

                                    bull Different for each data type (blobs entities queues) Every data object has a partition key

                                    bull A partition can be served by a single server

                                    bull System load balances partitions based on traffic pattern

                                    bull Controls entity locality Partition key is unit of scale

                                    bull Load balancing can take a few minutes to kick in

                                    bull Can take a couple of seconds for partition to be available on a different server

                                    System load balances

                                    bull Use exponential backoff on ldquoServer Busyrdquo

                                    bull Our system load balances to meet your traffic needs

                                    bull Single partition limits have been reached Server Busy

                                    Partition Keys In Each Abstraction

                                    bull Entities w same PartitionKey value served from same partition Entities ndash TableName + PartitionKey

                                    PartitionKey (CustomerId) RowKey (RowKind)

                                    Name CreditCardNumber OrderTotal

                                    1 Customer-John Smith John Smith xxxx-xxxx-xxxx-xxxx

                                    1 Order ndash 1 $3512

                                    2 Customer-Bill Johnson Bill Johnson xxxx-xxxx-xxxx-xxxx

                                    2 Order ndash 3 $1000

                                    bull Every blob and its snapshots are in a single partition Blobs ndash Container name + Blob name

                                    bullAll messages for a single queue belong to the same partition Messages ndash Queue Name

                                    Container Name Blob Name

                                    image annarborbighousejpg

                                    image foxboroughgillettejpg

                                    video annarborbighousejpg

                                    Queue Message

                                    jobs Message1

                                    jobs Message2

                                    workflow Message1

                                    Scalability Targets

                                    Storage Account

                                    bull Capacity ndash Up to 100 TBs

                                    bull Transactions ndash Up to a few thousand requests per second

                                    bull Bandwidth ndash Up to a few hundred megabytes per second

                                    Single QueueTable Partition

                                    bull Up to 500 transactions per second

                                    To go above these numbers partition between multiple storage accounts and partitions

                                    When limit is hit app will see lsquo503 server busyrsquo applications should implement exponential backoff

                                    Single Blob Partition

                                    bull Throughput up to 60 MBs

                                    PartitionKey (Category)

                                    RowKey (Title)

                                    Timestamp ReleaseDate

                                    Action Fast amp Furious hellip 2009

                                    Action The Bourne Ultimatum hellip 2007

                                    hellip hellip hellip hellip

                                    Animation Open Season 2 hellip 2009

                                    Animation The Ant Bully hellip 2006

                                    PartitionKey (Category)

                                    RowKey (Title)

                                    Timestamp ReleaseDate

                                    Comedy Office Space hellip 1999

                                    hellip hellip hellip hellip

                                    SciFi X-Men Origins Wolverine hellip 2009

                                    hellip hellip hellip hellip

                                    War Defiance hellip 2008

                                    PartitionKey (Category)

                                    RowKey (Title)

                                    Timestamp ReleaseDate

                                    Action Fast amp Furious hellip 2009

                                    Action The Bourne Ultimatum hellip 2007

                                    hellip hellip hellip hellip

                                    Animation Open Season 2 hellip 2009

                                    Animation The Ant Bully hellip 2006

                                    hellip hellip hellip hellip

                                    Comedy Office Space hellip 1999

                                    hellip hellip hellip hellip

                                    SciFi X-Men Origins Wolverine hellip 2009

                                    hellip hellip hellip hellip

                                    War Defiance hellip 2008

                                    Partitions and Partition Ranges

                                    Key Selection Things to Consider

                                    bullDistribute load as much as possible bullHot partitions can be load balanced bullPartitionKey is critical for scalability

                                    See httpwwwmicrosoftpdccom2009SVC09 and httpazurescopecloudappnet for more information

                                    bull Avoid frequent large scans bull Parallelize queries bull Point queries are most efficient

                                    bullTransactions across a single partition bullTransaction semantics amp Reduce round trips

                                    Scalability

                                    Query Efficiency amp Speed

                                    Entity group transactions

                                    Expect Continuation Tokens ndash Seriously

                                    Maximum of 1000 rows in a response

                                    At the end of partition range boundary

                                    Maximum of 1000 rows in a response

                                    At the end of partition range boundary

                                    Maximum of 5 seconds to execute the query

                                    Tables Recap bullEfficient for frequently used queries

                                    bullSupports batch transactions

                                    bullDistributes load

                                    Select PartitionKey and RowKey that help scale

                                    Avoid ldquoAppend onlyrdquo patterns

                                    Always Handle continuation tokens

                                    ldquoORrdquo predicates are not optimized

                                    Implement back-off strategy for retries

                                    bullDistribute by using a hash etc as prefix

                                    bullExpect continuation tokens for range queries

                                    bullExecute the queries that form the ldquoORrdquo predicates as separate queries

                                    bullServer busy

                                    bullLoad balance partitions to meet traffic needs

                                    bullLoad on single partition has exceeded the limits

                                    WCF Data Services

                                    bullUse a new context for each logical operation

                                    bullAddObjectAttachTo can throw exception if entity is already being tracked

                                    bullPoint query throws an exception if resource does not exist Use IgnoreResourceNotFoundException

                                    Queues Their Unique Role in Building Reliable Scalable Applications

                                    bull Want roles that work closely together but are not bound together bull Tight coupling leads to brittleness

                                    bull This can aid in scaling and performance

                                    bull A queue can hold an unlimited number of messages bull Messages must be serializable as XML

                                    bull Limited to 8KB in size

                                    bull Commonly use the work ticket pattern

                                    bull Why not simply use a table

                                    Queue Terminology

                                    Message Lifecycle

                                    Queue

                                    Msg 1

                                    Msg 2

                                    Msg 3

                                    Msg 4

                                    Worker Role

                                    Worker Role

                                    PutMessage

                                    Web Role

                                    GetMessage (Timeout) RemoveMessage

                                    Msg 2 Msg 1

                                    Worker Role

                                    Msg 2

                                    POST httpmyaccountqueuecorewindowsnetmyqueuemessages

                                    HTTP11 200 OK Transfer-Encoding chunked Content-Type applicationxml Date Tue 09 Dec 2008 210430 GMT Server Nephos Queue Service Version 10 Microsoft-HTTPAPI20

                                    ltxml version=10 encoding=utf-8gt ltQueueMessagesListgt ltQueueMessagegt ltMessageIdgt5974b586-0df3-4e2d-ad0c-18e3892bfca2ltMessageIdgt ltInsertionTimegtMon 22 Sep 2008 232920 GMTltInsertionTimegt ltExpirationTimegtMon 29 Sep 2008 232920 GMTltExpirationTimegt ltPopReceiptgtYzQ4Yzg1MDIGM0MDFiZDAwYzEwltPopReceiptgt ltTimeNextVisiblegtTue 23 Sep 2008 052920GMTltTimeNextVisiblegt ltMessageTextgtPHRlc3Q+dGdGVzdD4=ltMessageTextgt ltQueueMessagegt ltQueueMessagesListgt

                                    DELETE httpmyaccountqueuecorewindowsnetmyqueuemessagesmessageidpopreceipt=YzQ4Yzg1MDIGM0MDFiZDAwYzEw

                                    Truncated Exponential Back Off Polling

                                    Consider a backoff polling approach Each empty poll

                                    increases interval by 2x

                                    A successful sets the interval back to 1

                                    44

                                    2 1

                                    1 1

                                    C1

                                    C2

                                    Removing Poison Messages

                                    1 1

                                    2 1

                                    3 4 0

                                    Producers Consumers

                                    P2

                                    P1

                                    3 0

                                    2 GetMessage(Q 30 s) msg 2

                                    1 GetMessage(Q 30 s) msg 1

                                    1 1

                                    2 1

                                    1 0

                                    2 0

                                    45

                                    C1

                                    C2

                                    Removing Poison Messages

                                    3 4 0

                                    Producers Consumers

                                    P2

                                    P1

                                    1 1

                                    2 1

                                    2 GetMessage(Q 30 s) msg 2 3 C2 consumed msg 2 4 DeleteMessage(Q msg 2) 7 GetMessage(Q 30 s) msg 1

                                    1 GetMessage(Q 30 s) msg 1 5 C1 crashed

                                    1 1

                                    2 1

                                    6 msg1 visible 30 s after Dequeue 3 0

                                    1 2

                                    1 1

                                    1 2

                                    46

                                    C1

                                    C2

                                    Removing Poison Messages

                                    3 4 0

                                    Producers Consumers

                                    P2

                                    P1

                                    1 2

                                    2 Dequeue(Q 30 sec) msg 2 3 C2 consumed msg 2 4 Delete(Q msg 2) 7 Dequeue(Q 30 sec) msg 1 8 C2 crashed

                                    1 Dequeue(Q 30 sec) msg 1 5 C1 crashed 10 C1 restarted 11 Dequeue(Q 30 sec) msg 1 12 DequeueCount gt 2 13 Delete (Q msg1) 1

                                    2

                                    6 msg1 visible 30s after Dequeue 9 msg1 visible 30s after Dequeue

                                    3 0

                                    1 3

                                    1 2

                                    1 3

                                    Queues Recap

                                    bullNo need to deal with failures Make message

                                    processing idempotent

                                    bullInvisible messages result in out of order Do not rely on order

                                    bullEnforce threshold on messagersquos dequeue count Use Dequeue count to remove

                                    poison messages

                                    bullMessages gt 8KB

                                    bullBatch messages

                                    bullGarbage collect orphaned blobs

                                    bullDynamically increasereduce workers

                                    Use blob to store message data with

                                    reference in message

                                    Use message count to scale

                                    bullNo need to deal with failures

                                    bullInvisible messages result in out of order

                                    bullEnforce threshold on messagersquos dequeue count

                                    bullDynamically increasereduce workers

                                    Windows Azure Storage Takeaways

                                    Blobs

                                    Drives

                                    Tables

                                    Queues

                                    httpblogsmsdncomwindowsazurestorage

                                    httpazurescopecloudappnet

                                    49

                                    A Quick Exercise

                                    hellipThen letrsquos look at some code and some tools

                                    50

                                    Code ndash AccountInformationcs public class AccountInformation private static string storageKey = ldquotHiSiSnOtMyKeY private static string accountName = jjstore private static StorageCredentialsAccountAndKey credentials internal static StorageCredentialsAccountAndKey Credentials get if (credentials == null) credentials = new StorageCredentialsAccountAndKey(accountName storageKey) return credentials

                                    51

                                    Code ndash BlobHelpercs public class BlobHelper private static string defaultContainerName = school private CloudBlobClient client = null private CloudBlobContainer container = null private void InitContainer() if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudBlobClient() container = clientGetContainerReference(defaultContainerName) containerCreateIfNotExist() BlobContainerPermissions permissions = containerGetPermissions() permissionsPublicAccess = BlobContainerPublicAccessTypeContainer containerSetPermissions(permissions)

                                    52

                                    Code ndash BlobHelpercs

                                    public void WriteFileToBlob(string filePath) if (client == null || container == null) InitContainer() FileInfo file = new FileInfo(filePath) CloudBlob blob = containerGetBlobReference(fileName) blobPropertiesContentType = GetContentType(fileExtension) blobUploadFile(fileFullName) Or if you want to write a string replace the last line with blobUploadText(someString) And make sure you set the content type to the appropriate MIME type (eg ldquotextplainrdquo)

                                    53

                                    Code ndash BlobHelpercs

                                    public string GetBlobText(string blobName) if (client == null || container == null) InitContainer() CloudBlob blob = containerGetBlobReference(blobName) try return blobDownloadText() catch (Exception) The blob probably does not exist or there is no connection available return null

                                    54

                                    Application Code - Blobs private void SaveToCloudButton_Click(object sender RoutedEventArgs e) StringBuilder buff = new StringBuilder() buffAppendLine(LastNameFirstNameEmailBirthdayNativeLanguageFavoriteIceCreamYearsInPhDGraduated) foreach (AttendeeEntity attendee in attendees) buffAppendLine(attendeeToCsvString()) blobHelperWriteStringToBlob(SummerSchoolAttendeestxt buffToString())

                                    The blob is now available at httpltAccountNamegtblobcorewindowsnetltContainerNamegtltBlobNamegt Or in this case httpjjstoreblobcorewindowsnetschoolSummerSchoolAttendeestxt

                                    55

                                    Code - TableEntities using MicrosoftWindowsAzureStorageClient public class AttendeeEntity TableServiceEntity public string FirstName get set public string LastName get set public string Email get set public DateTime Birthday get set public string FavoriteIceCream get set public int YearsInPhD get set public bool Graduated get set hellip

                                    56

                                    Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

                                    57

                                    Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

                                    58

                                    Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

                                    59

                                    Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

                                    60

                                    Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

                                    Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

                                    61

                                    Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

                                    62

                                    Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

                                    Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

                                    63

                                    Tools ndash Fiddler2

                                    Best Practices

                                    Picking the Right VM Size

                                    bull Having the correct VM size can make a big difference in costs

                                    bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                                    bull If you scale better than linear across cores larger VMs could save you money

                                    bull Pretty rare to see linear scaling across 8 cores

                                    bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                                    bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                                    Using Your VM to the Maximum

                                    Remember bull 1 role instance == 1 VM running Windows

                                    bull 1 role instance = one specific task for your code

                                    bull Yoursquore paying for the entire VM so why not use it

                                    bull Common mistake ndash split up code into multiple roles each not using up CPU

                                    bull Balance between using up CPU vs having free capacity in times of need

                                    bull Multiple ways to use your CPU to the fullest

                                    Exploiting Concurrency

                                    bull Spin up additional processes each with a specific task or as a unit of concurrency

                                    bull May not be ideal if number of active processes exceeds number of cores

                                    bull Use multithreading aggressively

                                    bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                                    bull In NET 4 use the Task Parallel Library

                                    bull Data parallelism

                                    bull Task parallelism

                                    Finding Good Code Neighbors

                                    bull Typically code falls into one or more of these categories

                                    bull Find code that is intensive with different resources to live together

                                    bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                                    Memory Intensive

                                    CPU Intensive

                                    Network IO Intensive

                                    Storage IO Intensive

                                    Scaling Appropriately

                                    bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                                    bull Spinning VMs up and down automatically is good at large scale

                                    bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                                    bull Being too aggressive in spinning down VMs can result in poor user experience

                                    bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                                    Performance Cost

                                    Storage Costs

                                    bull Understand an applicationrsquos storage profile and how storage billing works

                                    bull Make service choices based on your app profile

                                    bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                                    bull Service choice can make a big cost difference based on your app profile

                                    bull Caching and compressing They help a lot with storage costs

                                    Saving Bandwidth Costs

                                    Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                    Sending fewer things over the wire often means getting fewer things from storage

                                    Saving bandwidth costs often lead to savings in other places

                                    Sending fewer things means your VM has time to do other tasks

                                    All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                    Compressing Content

                                    1 Gzip all output content

                                    bull All modern browsers can decompress on the fly

                                    bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                    2Tradeoff compute costs for storage size

                                    3Minimize image sizes

                                    bull Use Portable Network Graphics (PNGs)

                                    bull Crush your PNGs

                                    bull Strip needless metadata

                                    bull Make all PNGs palette PNGs

                                    Uncompressed

                                    Content

                                    Compressed

                                    Content

                                    Gzip

                                    Minify JavaScript

                                    Minify CCS

                                    Minify Images

                                    Best Practices Summary

                                    Doing lsquolessrsquo is the key to saving costs

                                    Measure everything

                                    Know your application profile in and out

                                    Research Examples in the Cloud

                                    hellipon another set of slides

                                    Map Reduce on Azure

                                    bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                    bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                    76

                                    Project Daytona - Map Reduce on Azure

                                    httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                    77

                                    Questions and Discussionhellip

                                    Thank you for hosting me at the Summer School

                                    BLAST (Basic Local Alignment Search Tool)

                                    bull The most important software in bioinformatics

                                    bull Identify similarity between bio-sequences

                                    Computationally intensive

                                    bull Large number of pairwise alignment operations

                                    bull A BLAST running can take 700 ~ 1000 CPU hours

                                    bull Sequence databases growing exponentially

                                    bull GenBank doubled in size in about 15 months

                                    It is easy to parallelize BLAST

                                    bull Segment the input

                                    bull Segment processing (querying) is pleasingly parallel

                                    bull Segment the database (eg mpiBLAST)

                                    bull Needs special result reduction processing

                                    Large volume data

                                    bull A normal Blast database can be as large as 10GB

                                    bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                    bull The output of BLAST is usually 10-100x larger than the input

                                    bull Parallel BLAST engine on Azure

                                    bull Query-segmentation data-parallel pattern bull split the input sequences

                                    bull query partitions in parallel

                                    bull merge results together when done

                                    bull Follows the general suggested application model bull Web Role + Queue + Worker

                                    bull With three special considerations bull Batch job management

                                    bull Task parallelism on an elastic Cloud

                                    Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                    on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                    A simple SplitJoin pattern

                                    Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                    bull 1248 for small middle large and extra large instance size

                                    Task granularity bull Large partition load imbalance

                                    bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                    bull Data transferring overhead

                                    Best Practice test runs to profiling and set size to mitigate the overhead

                                    Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                    bull too small repeated computation

                                    bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                    bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                    bull Watch out for the 2-hour maximum limitation

                                    BLAST task

                                    Splitting task

                                    BLAST task

                                    BLAST task

                                    BLAST task

                                    hellip

                                    Merging Task

                                    Task size vs Performance

                                    bull Benefit of the warm cache effect

                                    bull 100 sequences per partition is the best choice

                                    Instance size vs Performance

                                    bull Super-linear speedup with larger size worker instances

                                    bull Primarily due to the memory capability

                                    Task SizeInstance Size vs Cost

                                    bull Extra-large instance generated the best and the most economical throughput

                                    bull Fully utilize the resource

                                    Web

                                    Portal

                                    Web

                                    Service

                                    Job registration

                                    Job Scheduler

                                    Worker

                                    Worker

                                    Worker

                                    Global

                                    dispatch

                                    queue

                                    Web Role

                                    Azure Table

                                    Job Management Role

                                    Azure Blob

                                    Database

                                    updating Role

                                    hellip

                                    Scaling Engine

                                    Blast databases

                                    temporary data etc)

                                    Job Registry NCBI databases

                                    BLAST task

                                    Splitting task

                                    BLAST task

                                    BLAST task

                                    BLAST task

                                    hellip

                                    Merging Task

                                    ASPNET program hosted by a web role instance bull Submit jobs

                                    bull Track jobrsquos status and logs

                                    AuthenticationAuthorization based on Live ID

                                    The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                    Web Portal

                                    Web Service

                                    Job registration

                                    Job Scheduler

                                    Job Portal

                                    Scaling Engine

                                    Job Registry

                                    R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                    Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                    bull Against ~5000 proteins from another strain completed in less than 30 sec

                                    AzureBLAST significantly saved computing timehellip

                                    Discovering Homologs bull Discover the interrelationships of known protein sequences

                                    ldquoAll against Allrdquo query bull The database is also the input query

                                    bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                    bull Theoretically 100 billion sequence comparisons

                                    Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                    bull Would require 3216731 minutes (61 years) on one desktop

                                    One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                    bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                    bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                    bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                    bull Each segment consists of smaller partitions

                                    bull When load imbalances redistribute the load manually

                                    5

                                    0

                                    62 6

                                    2 6

                                    2 6

                                    2 6

                                    2 5

                                    0 62

                                    bull Total size of the output result is ~230GB

                                    bull The number of total hits is 1764579487

                                    bull Started at March 25th the last task completed on April 8th (10 days compute)

                                    bull But based our estimates real working instance time should be 6~8 day

                                    bull Look into log data to analyze what took placehellip

                                    5

                                    0

                                    62 6

                                    2 6

                                    2 6

                                    2 6

                                    2 5

                                    0 62

                                    A normal log record should be

                                    Otherwise something is wrong (eg task failed to complete)

                                    3312010 614 RD00155D3611B0 Executing the task 251523

                                    3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                    3312010 625 RD00155D3611B0 Executing the task 251553

                                    3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                    3312010 644 RD00155D3611B0 Executing the task 251600

                                    3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                    3312010 822 RD00155D3611B0 Executing the task 251774

                                    3312010 950 RD00155D3611B0 Executing the task 251895

                                    3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                    North Europe Data Center totally 34256 tasks processed

                                    All 62 compute nodes lost tasks

                                    and then came back in a group

                                    This is an Update domain

                                    ~30 mins

                                    ~ 6 nodes in one group

                                    35 Nodes experience blob

                                    writing failure at same

                                    time

                                    West Europe Datacenter 30976 tasks are completed and job was killed

                                    A reasonable guess the

                                    Fault Domain is working

                                    MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                    You never miss the water till the well has run dry Irish Proverb

                                    ET = Water volume evapotranspired (m3 s-1 m-2)

                                    Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                    λv = Latent heat of vaporization (Jg)

                                    Rn = Net radiation (W m-2)

                                    cp = Specific heat capacity of air (J kg-1 K-1)

                                    ρa = dry air density (kg m-3)

                                    δq = vapor pressure deficit (Pa)

                                    ga = Conductivity of air (inverse of ra) (m s-1)

                                    gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                    γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                    Estimating resistanceconductivity across a

                                    catchment can be tricky

                                    bull Lots of inputs big data reduction

                                    bull Some of the inputs are not so simple

                                    119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                    (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                    Penman-Monteith (1964)

                                    Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                    NASA MODIS

                                    imagery source

                                    archives

                                    5 TB (600K files)

                                    FLUXNET curated

                                    sensor dataset

                                    (30GB 960 files)

                                    FLUXNET curated

                                    field dataset

                                    2 KB (1 file)

                                    NCEPNCAR

                                    ~100MB

                                    (4K files)

                                    Vegetative

                                    clumping

                                    ~5MB (1file)

                                    Climate

                                    classification

                                    ~1MB (1file)

                                    20 US year = 1 global year

                                    Data collection (map) stage

                                    bull Downloads requested input tiles

                                    from NASA ftp sites

                                    bull Includes geospatial lookup for

                                    non-sinusoidal tiles that will

                                    contribute to a reprojected

                                    sinusoidal tile

                                    Reprojection (map) stage

                                    bull Converts source tile(s) to

                                    intermediate result sinusoidal tiles

                                    bull Simple nearest neighbor or spline

                                    algorithms

                                    Derivation reduction stage

                                    bull First stage visible to scientist

                                    bull Computes ET in our initial use

                                    Analysis reduction stage

                                    bull Optional second stage visible to

                                    scientist

                                    bull Enables production of science

                                    analysis artifacts such as maps

                                    tables virtual sensors

                                    Reduction 1

                                    Queue

                                    Source

                                    Metadata

                                    AzureMODIS

                                    Service Web Role Portal

                                    Request

                                    Queue

                                    Scientific

                                    Results

                                    Download

                                    Data Collection Stage

                                    Source Imagery Download Sites

                                    Reprojection

                                    Queue

                                    Reduction 2

                                    Queue

                                    Download

                                    Queue

                                    Scientists

                                    Science results

                                    Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                    httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                    bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                    bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                    bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                    recoverable units of work

                                    bull Execution status of all jobs and tasks persisted in Tables

                                    ltPipelineStagegt

                                    Request

                                    hellip ltPipelineStagegtJobStatus

                                    Persist ltPipelineStagegtJob Queue

                                    MODISAzure Service

                                    (Web Role)

                                    Service Monitor

                                    (Worker Role)

                                    Parse amp Persist ltPipelineStagegtTaskStatus

                                    hellip

                                    Dispatch

                                    ltPipelineStagegtTask Queue

                                    All work actually done by a Worker Role

                                    bull Sandboxes science or other executable

                                    bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                    Service Monitor

                                    (Worker Role)

                                    Parse amp Persist ltPipelineStagegtTaskStatus

                                    GenericWorker

                                    (Worker Role)

                                    hellip

                                    hellip

                                    Dispatch

                                    ltPipelineStagegtTask Queue

                                    hellip

                                    ltInputgtData Storage

                                    bull Dequeues tasks created by the Service Monitor

                                    bull Retries failed tasks 3 times

                                    bull Maintains all task status

                                    Reprojection Request

                                    hellip

                                    Service Monitor

                                    (Worker Role)

                                    ReprojectionJobStatus Persist

                                    Parse amp Persist ReprojectionTaskStatus

                                    GenericWorker

                                    (Worker Role)

                                    hellip

                                    Job Queue

                                    hellip

                                    Dispatch

                                    Task Queue

                                    Points to

                                    hellip

                                    ScanTimeList

                                    SwathGranuleMeta

                                    Reprojection Data

                                    Storage

                                    Each entity specifies a

                                    single reprojection job

                                    request

                                    Each entity specifies a

                                    single reprojection task (ie

                                    a single tile)

                                    Query this table to get

                                    geo-metadata (eg

                                    boundaries) for each swath

                                    tile

                                    Query this table to get the

                                    list of satellite scan times

                                    that cover a target tile

                                    Swath Source

                                    Data Storage

                                    bull Computational costs driven by data scale and need to run reduction multiple times

                                    bull Storage costs driven by data scale and 6 month project duration

                                    bull Small with respect to the people costs even at graduate student rates

                                    Reduction 1 Queue

                                    Source Metadata

                                    Request Queue

                                    Scientific Results Download

                                    Data Collection Stage

                                    Source Imagery Download Sites

                                    Reprojection Queue

                                    Reduction 2 Queue

                                    Download

                                    Queue

                                    Scientists

                                    Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                    400-500 GB

                                    60K files

                                    10 MBsec

                                    11 hours

                                    lt10 workers

                                    $50 upload

                                    $450 storage

                                    400 GB

                                    45K files

                                    3500 hours

                                    20-100

                                    workers

                                    5-7 GB

                                    55K files

                                    1800 hours

                                    20-100

                                    workers

                                    lt10 GB

                                    ~1K files

                                    1800 hours

                                    20-100

                                    workers

                                    $420 cpu

                                    $60 download

                                    $216 cpu

                                    $1 download

                                    $6 storage

                                    $216 cpu

                                    $2 download

                                    $9 storage

                                    AzureMODIS

                                    Service Web Role Portal

                                    Total $1420

                                    bull Clouds are the largest scale computer centers ever constructed and have the

                                    potential to be important to both large and small scale science problems

                                    bull Equally import they can increase participation in research providing needed

                                    resources to userscommunities without ready access

                                    bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                    support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                    latency applications do not perform optimally on clouds today

                                    bull Provide valuable fault tolerance and scalability abstractions

                                    bull Clouds as amplifier for familiar client tools and on premise compute

                                    bull Clouds services to support research provide considerable leverage for both

                                    individual researchers and entire communities of researchers

                                    • Day 2 - Azure as a PaaS
                                    • Day 2 - Applications

                                      Automated Service Management Provide code + service model

                                      bull Platform identifies and allocates resources deploys the service manages service health

                                      bull Configuration is handled by two files

                                      ServiceDefinitioncsdef

                                      ServiceConfigurationcscfg

                                      Service Definition

                                      Service Configuration

                                      GUI

                                      Double click on Role Name in Azure Project

                                      Deploying to the cloud

                                      bull We can deploy from the portal or from script

                                      bull VS builds two files

                                      bull Encrypted package of your code

                                      bull Your config file

                                      bull You must create an Azure account then a service and then you deploy your code

                                      bull Can take up to 20 minutes

                                      bull (which is better than six months)

                                      Service Management API

                                      bullREST based API to manage your services

                                      bullX509-certs for authentication

                                      bullLets you create delete change upgrade swaphellip

                                      bullLots of community and MSFT-built tools around the API

                                      - Easy to roll your own

                                      The Secret Sauce ndash The Fabric

                                      The Fabric is the lsquobrainrsquo behind Windows Azure

                                      1 Process service model

                                      1 Determine resource requirements

                                      2 Create role images

                                      2 Allocate resources

                                      3 Prepare nodes

                                      1 Place role images on nodes

                                      2 Configure settings

                                      3 Start roles

                                      4 Configure load balancers

                                      5 Maintain service health

                                      1 If role fails restart the role based on policy

                                      2 If node fails migrate the role based on policy

                                      Storage

                                      Durable Storage At Massive Scale

                                      Blob

                                      - Massive files eg videos logs

                                      Drive

                                      - Use standard file system APIs

                                      Tables - Non-relational but with few scale limits

                                      - Use SQL Azure for relational data

                                      Queues

                                      - Facilitate loosely-coupled reliable systems

                                      Blob Features and Functions

                                      bull Store Large Objects (up to 1TB in size)

                                      bull Can be served through Windows Azure CDN service

                                      bull Standard REST Interface

                                      bull PutBlob

                                      bull Inserts a new blob overwrites the existing blob

                                      bull GetBlob

                                      bull Get whole blob or a specific range

                                      bull DeleteBlob

                                      bull CopyBlob

                                      bull SnapshotBlob

                                      bull LeaseBlob

                                      Two Types of Blobs Under the Hood

                                      bull Block Blob

                                      bull Targeted at streaming workloads

                                      bull Each blob consists of a sequence of blocks bull Each block is identified by a Block ID

                                      bull Size limit 200GB per blob

                                      bull Page Blob

                                      bull Targeted at random readwrite workloads

                                      bull Each blob consists of an array of pages bull Each page is identified by its offset

                                      from the start of the blob

                                      bull Size limit 1TB per blob

                                      Windows Azure Drive

                                      bull Provides a durable NTFS volume for Windows Azure applications to use

                                      bull Use existing NTFS APIs to access a durable drive bull Durability and survival of data on application failover

                                      bull Enables migrating existing NTFS applications to the cloud

                                      bull A Windows Azure Drive is a Page Blob

                                      bull Example mount Page Blob as X bull httpltaccountnamegtblobcorewindowsnetltcontainernamegtltblobnamegt

                                      bull All writes to drive are made durable to the Page Blob bull Drive made durable through standard Page Blob replication

                                      bull Drive persists even when not mounted as a Page Blob

                                      Windows Azure Tables

                                      bull Provides Structured Storage

                                      bull Massively Scalable Tables bull Billions of entities (rows) and TBs of data

                                      bull Can use thousands of servers as traffic grows

                                      bull Highly Available amp Durable bull Data is replicated several times

                                      bull Familiar and Easy to use API

                                      bull WCF Data Services and OData bull NET classes and LINQ

                                      bull REST ndash with any platform or language

                                      Windows Azure Queues

                                      bull Queue are performance efficient highly available and provide reliable message delivery

                                      bull Simple asynchronous work dispatch

                                      bull Programming semantics ensure that a message can be processed at least once

                                      bull Access is provided via REST

                                      Storage Partitioning

                                      Understanding partitioning is key to understanding performance

                                      bull Different for each data type (blobs entities queues) Every data object has a partition key

                                      bull A partition can be served by a single server

                                      bull System load balances partitions based on traffic pattern

                                      bull Controls entity locality Partition key is unit of scale

                                      bull Load balancing can take a few minutes to kick in

                                      bull Can take a couple of seconds for partition to be available on a different server

                                      System load balances

                                      bull Use exponential backoff on ldquoServer Busyrdquo

                                      bull Our system load balances to meet your traffic needs

                                      bull Single partition limits have been reached Server Busy

                                      Partition Keys In Each Abstraction

                                      bull Entities w same PartitionKey value served from same partition Entities ndash TableName + PartitionKey

                                      PartitionKey (CustomerId) RowKey (RowKind)

                                      Name CreditCardNumber OrderTotal

                                      1 Customer-John Smith John Smith xxxx-xxxx-xxxx-xxxx

                                      1 Order ndash 1 $3512

                                      2 Customer-Bill Johnson Bill Johnson xxxx-xxxx-xxxx-xxxx

                                      2 Order ndash 3 $1000

                                      bull Every blob and its snapshots are in a single partition Blobs ndash Container name + Blob name

                                      bullAll messages for a single queue belong to the same partition Messages ndash Queue Name

                                      Container Name Blob Name

                                      image annarborbighousejpg

                                      image foxboroughgillettejpg

                                      video annarborbighousejpg

                                      Queue Message

                                      jobs Message1

                                      jobs Message2

                                      workflow Message1

                                      Scalability Targets

                                      Storage Account

                                      bull Capacity ndash Up to 100 TBs

                                      bull Transactions ndash Up to a few thousand requests per second

                                      bull Bandwidth ndash Up to a few hundred megabytes per second

                                      Single QueueTable Partition

                                      bull Up to 500 transactions per second

                                      To go above these numbers partition between multiple storage accounts and partitions

                                      When limit is hit app will see lsquo503 server busyrsquo applications should implement exponential backoff

                                      Single Blob Partition

                                      bull Throughput up to 60 MBs

                                      PartitionKey (Category)

                                      RowKey (Title)

                                      Timestamp ReleaseDate

                                      Action Fast amp Furious hellip 2009

                                      Action The Bourne Ultimatum hellip 2007

                                      hellip hellip hellip hellip

                                      Animation Open Season 2 hellip 2009

                                      Animation The Ant Bully hellip 2006

                                      PartitionKey (Category)

                                      RowKey (Title)

                                      Timestamp ReleaseDate

                                      Comedy Office Space hellip 1999

                                      hellip hellip hellip hellip

                                      SciFi X-Men Origins Wolverine hellip 2009

                                      hellip hellip hellip hellip

                                      War Defiance hellip 2008

                                      PartitionKey (Category)

                                      RowKey (Title)

                                      Timestamp ReleaseDate

                                      Action Fast amp Furious hellip 2009

                                      Action The Bourne Ultimatum hellip 2007

                                      hellip hellip hellip hellip

                                      Animation Open Season 2 hellip 2009

                                      Animation The Ant Bully hellip 2006

                                      hellip hellip hellip hellip

                                      Comedy Office Space hellip 1999

                                      hellip hellip hellip hellip

                                      SciFi X-Men Origins Wolverine hellip 2009

                                      hellip hellip hellip hellip

                                      War Defiance hellip 2008

                                      Partitions and Partition Ranges

                                      Key Selection Things to Consider

                                      bullDistribute load as much as possible bullHot partitions can be load balanced bullPartitionKey is critical for scalability

                                      See httpwwwmicrosoftpdccom2009SVC09 and httpazurescopecloudappnet for more information

                                      bull Avoid frequent large scans bull Parallelize queries bull Point queries are most efficient

                                      bullTransactions across a single partition bullTransaction semantics amp Reduce round trips

                                      Scalability

                                      Query Efficiency amp Speed

                                      Entity group transactions

                                      Expect Continuation Tokens ndash Seriously

                                      Maximum of 1000 rows in a response

                                      At the end of partition range boundary

                                      Maximum of 1000 rows in a response

                                      At the end of partition range boundary

                                      Maximum of 5 seconds to execute the query

                                      Tables Recap bullEfficient for frequently used queries

                                      bullSupports batch transactions

                                      bullDistributes load

                                      Select PartitionKey and RowKey that help scale

                                      Avoid ldquoAppend onlyrdquo patterns

                                      Always Handle continuation tokens

                                      ldquoORrdquo predicates are not optimized

                                      Implement back-off strategy for retries

                                      bullDistribute by using a hash etc as prefix

                                      bullExpect continuation tokens for range queries

                                      bullExecute the queries that form the ldquoORrdquo predicates as separate queries

                                      bullServer busy

                                      bullLoad balance partitions to meet traffic needs

                                      bullLoad on single partition has exceeded the limits

                                      WCF Data Services

                                      bullUse a new context for each logical operation

                                      bullAddObjectAttachTo can throw exception if entity is already being tracked

                                      bullPoint query throws an exception if resource does not exist Use IgnoreResourceNotFoundException

                                      Queues Their Unique Role in Building Reliable Scalable Applications

                                      bull Want roles that work closely together but are not bound together bull Tight coupling leads to brittleness

                                      bull This can aid in scaling and performance

                                      bull A queue can hold an unlimited number of messages bull Messages must be serializable as XML

                                      bull Limited to 8KB in size

                                      bull Commonly use the work ticket pattern

                                      bull Why not simply use a table

                                      Queue Terminology

                                      Message Lifecycle

                                      Queue

                                      Msg 1

                                      Msg 2

                                      Msg 3

                                      Msg 4

                                      Worker Role

                                      Worker Role

                                      PutMessage

                                      Web Role

                                      GetMessage (Timeout) RemoveMessage

                                      Msg 2 Msg 1

                                      Worker Role

                                      Msg 2

                                      POST httpmyaccountqueuecorewindowsnetmyqueuemessages

                                      HTTP11 200 OK Transfer-Encoding chunked Content-Type applicationxml Date Tue 09 Dec 2008 210430 GMT Server Nephos Queue Service Version 10 Microsoft-HTTPAPI20

                                      ltxml version=10 encoding=utf-8gt ltQueueMessagesListgt ltQueueMessagegt ltMessageIdgt5974b586-0df3-4e2d-ad0c-18e3892bfca2ltMessageIdgt ltInsertionTimegtMon 22 Sep 2008 232920 GMTltInsertionTimegt ltExpirationTimegtMon 29 Sep 2008 232920 GMTltExpirationTimegt ltPopReceiptgtYzQ4Yzg1MDIGM0MDFiZDAwYzEwltPopReceiptgt ltTimeNextVisiblegtTue 23 Sep 2008 052920GMTltTimeNextVisiblegt ltMessageTextgtPHRlc3Q+dGdGVzdD4=ltMessageTextgt ltQueueMessagegt ltQueueMessagesListgt

                                      DELETE httpmyaccountqueuecorewindowsnetmyqueuemessagesmessageidpopreceipt=YzQ4Yzg1MDIGM0MDFiZDAwYzEw

                                      Truncated Exponential Back Off Polling

                                      Consider a backoff polling approach Each empty poll

                                      increases interval by 2x

                                      A successful sets the interval back to 1

                                      44

                                      2 1

                                      1 1

                                      C1

                                      C2

                                      Removing Poison Messages

                                      1 1

                                      2 1

                                      3 4 0

                                      Producers Consumers

                                      P2

                                      P1

                                      3 0

                                      2 GetMessage(Q 30 s) msg 2

                                      1 GetMessage(Q 30 s) msg 1

                                      1 1

                                      2 1

                                      1 0

                                      2 0

                                      45

                                      C1

                                      C2

                                      Removing Poison Messages

                                      3 4 0

                                      Producers Consumers

                                      P2

                                      P1

                                      1 1

                                      2 1

                                      2 GetMessage(Q 30 s) msg 2 3 C2 consumed msg 2 4 DeleteMessage(Q msg 2) 7 GetMessage(Q 30 s) msg 1

                                      1 GetMessage(Q 30 s) msg 1 5 C1 crashed

                                      1 1

                                      2 1

                                      6 msg1 visible 30 s after Dequeue 3 0

                                      1 2

                                      1 1

                                      1 2

                                      46

                                      C1

                                      C2

                                      Removing Poison Messages

                                      3 4 0

                                      Producers Consumers

                                      P2

                                      P1

                                      1 2

                                      2 Dequeue(Q 30 sec) msg 2 3 C2 consumed msg 2 4 Delete(Q msg 2) 7 Dequeue(Q 30 sec) msg 1 8 C2 crashed

                                      1 Dequeue(Q 30 sec) msg 1 5 C1 crashed 10 C1 restarted 11 Dequeue(Q 30 sec) msg 1 12 DequeueCount gt 2 13 Delete (Q msg1) 1

                                      2

                                      6 msg1 visible 30s after Dequeue 9 msg1 visible 30s after Dequeue

                                      3 0

                                      1 3

                                      1 2

                                      1 3

                                      Queues Recap

                                      bullNo need to deal with failures Make message

                                      processing idempotent

                                      bullInvisible messages result in out of order Do not rely on order

                                      bullEnforce threshold on messagersquos dequeue count Use Dequeue count to remove

                                      poison messages

                                      bullMessages gt 8KB

                                      bullBatch messages

                                      bullGarbage collect orphaned blobs

                                      bullDynamically increasereduce workers

                                      Use blob to store message data with

                                      reference in message

                                      Use message count to scale

                                      bullNo need to deal with failures

                                      bullInvisible messages result in out of order

                                      bullEnforce threshold on messagersquos dequeue count

                                      bullDynamically increasereduce workers

                                      Windows Azure Storage Takeaways

                                      Blobs

                                      Drives

                                      Tables

                                      Queues

                                      httpblogsmsdncomwindowsazurestorage

                                      httpazurescopecloudappnet

                                      49

                                      A Quick Exercise

                                      hellipThen letrsquos look at some code and some tools

                                      50

                                      Code ndash AccountInformationcs public class AccountInformation private static string storageKey = ldquotHiSiSnOtMyKeY private static string accountName = jjstore private static StorageCredentialsAccountAndKey credentials internal static StorageCredentialsAccountAndKey Credentials get if (credentials == null) credentials = new StorageCredentialsAccountAndKey(accountName storageKey) return credentials

                                      51

                                      Code ndash BlobHelpercs public class BlobHelper private static string defaultContainerName = school private CloudBlobClient client = null private CloudBlobContainer container = null private void InitContainer() if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudBlobClient() container = clientGetContainerReference(defaultContainerName) containerCreateIfNotExist() BlobContainerPermissions permissions = containerGetPermissions() permissionsPublicAccess = BlobContainerPublicAccessTypeContainer containerSetPermissions(permissions)

                                      52

                                      Code ndash BlobHelpercs

                                      public void WriteFileToBlob(string filePath) if (client == null || container == null) InitContainer() FileInfo file = new FileInfo(filePath) CloudBlob blob = containerGetBlobReference(fileName) blobPropertiesContentType = GetContentType(fileExtension) blobUploadFile(fileFullName) Or if you want to write a string replace the last line with blobUploadText(someString) And make sure you set the content type to the appropriate MIME type (eg ldquotextplainrdquo)

                                      53

                                      Code ndash BlobHelpercs

                                      public string GetBlobText(string blobName) if (client == null || container == null) InitContainer() CloudBlob blob = containerGetBlobReference(blobName) try return blobDownloadText() catch (Exception) The blob probably does not exist or there is no connection available return null

                                      54

                                      Application Code - Blobs private void SaveToCloudButton_Click(object sender RoutedEventArgs e) StringBuilder buff = new StringBuilder() buffAppendLine(LastNameFirstNameEmailBirthdayNativeLanguageFavoriteIceCreamYearsInPhDGraduated) foreach (AttendeeEntity attendee in attendees) buffAppendLine(attendeeToCsvString()) blobHelperWriteStringToBlob(SummerSchoolAttendeestxt buffToString())

                                      The blob is now available at httpltAccountNamegtblobcorewindowsnetltContainerNamegtltBlobNamegt Or in this case httpjjstoreblobcorewindowsnetschoolSummerSchoolAttendeestxt

                                      55

                                      Code - TableEntities using MicrosoftWindowsAzureStorageClient public class AttendeeEntity TableServiceEntity public string FirstName get set public string LastName get set public string Email get set public DateTime Birthday get set public string FavoriteIceCream get set public int YearsInPhD get set public bool Graduated get set hellip

                                      56

                                      Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

                                      57

                                      Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

                                      58

                                      Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

                                      59

                                      Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

                                      60

                                      Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

                                      Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

                                      61

                                      Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

                                      62

                                      Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

                                      Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

                                      63

                                      Tools ndash Fiddler2

                                      Best Practices

                                      Picking the Right VM Size

                                      bull Having the correct VM size can make a big difference in costs

                                      bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                                      bull If you scale better than linear across cores larger VMs could save you money

                                      bull Pretty rare to see linear scaling across 8 cores

                                      bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                                      bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                                      Using Your VM to the Maximum

                                      Remember bull 1 role instance == 1 VM running Windows

                                      bull 1 role instance = one specific task for your code

                                      bull Yoursquore paying for the entire VM so why not use it

                                      bull Common mistake ndash split up code into multiple roles each not using up CPU

                                      bull Balance between using up CPU vs having free capacity in times of need

                                      bull Multiple ways to use your CPU to the fullest

                                      Exploiting Concurrency

                                      bull Spin up additional processes each with a specific task or as a unit of concurrency

                                      bull May not be ideal if number of active processes exceeds number of cores

                                      bull Use multithreading aggressively

                                      bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                                      bull In NET 4 use the Task Parallel Library

                                      bull Data parallelism

                                      bull Task parallelism

                                      Finding Good Code Neighbors

                                      bull Typically code falls into one or more of these categories

                                      bull Find code that is intensive with different resources to live together

                                      bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                                      Memory Intensive

                                      CPU Intensive

                                      Network IO Intensive

                                      Storage IO Intensive

                                      Scaling Appropriately

                                      bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                                      bull Spinning VMs up and down automatically is good at large scale

                                      bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                                      bull Being too aggressive in spinning down VMs can result in poor user experience

                                      bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                                      Performance Cost

                                      Storage Costs

                                      bull Understand an applicationrsquos storage profile and how storage billing works

                                      bull Make service choices based on your app profile

                                      bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                                      bull Service choice can make a big cost difference based on your app profile

                                      bull Caching and compressing They help a lot with storage costs

                                      Saving Bandwidth Costs

                                      Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                      Sending fewer things over the wire often means getting fewer things from storage

                                      Saving bandwidth costs often lead to savings in other places

                                      Sending fewer things means your VM has time to do other tasks

                                      All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                      Compressing Content

                                      1 Gzip all output content

                                      bull All modern browsers can decompress on the fly

                                      bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                      2Tradeoff compute costs for storage size

                                      3Minimize image sizes

                                      bull Use Portable Network Graphics (PNGs)

                                      bull Crush your PNGs

                                      bull Strip needless metadata

                                      bull Make all PNGs palette PNGs

                                      Uncompressed

                                      Content

                                      Compressed

                                      Content

                                      Gzip

                                      Minify JavaScript

                                      Minify CCS

                                      Minify Images

                                      Best Practices Summary

                                      Doing lsquolessrsquo is the key to saving costs

                                      Measure everything

                                      Know your application profile in and out

                                      Research Examples in the Cloud

                                      hellipon another set of slides

                                      Map Reduce on Azure

                                      bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                      bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                      76

                                      Project Daytona - Map Reduce on Azure

                                      httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                      77

                                      Questions and Discussionhellip

                                      Thank you for hosting me at the Summer School

                                      BLAST (Basic Local Alignment Search Tool)

                                      bull The most important software in bioinformatics

                                      bull Identify similarity between bio-sequences

                                      Computationally intensive

                                      bull Large number of pairwise alignment operations

                                      bull A BLAST running can take 700 ~ 1000 CPU hours

                                      bull Sequence databases growing exponentially

                                      bull GenBank doubled in size in about 15 months

                                      It is easy to parallelize BLAST

                                      bull Segment the input

                                      bull Segment processing (querying) is pleasingly parallel

                                      bull Segment the database (eg mpiBLAST)

                                      bull Needs special result reduction processing

                                      Large volume data

                                      bull A normal Blast database can be as large as 10GB

                                      bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                      bull The output of BLAST is usually 10-100x larger than the input

                                      bull Parallel BLAST engine on Azure

                                      bull Query-segmentation data-parallel pattern bull split the input sequences

                                      bull query partitions in parallel

                                      bull merge results together when done

                                      bull Follows the general suggested application model bull Web Role + Queue + Worker

                                      bull With three special considerations bull Batch job management

                                      bull Task parallelism on an elastic Cloud

                                      Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                      on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                      A simple SplitJoin pattern

                                      Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                      bull 1248 for small middle large and extra large instance size

                                      Task granularity bull Large partition load imbalance

                                      bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                      bull Data transferring overhead

                                      Best Practice test runs to profiling and set size to mitigate the overhead

                                      Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                      bull too small repeated computation

                                      bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                      bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                      bull Watch out for the 2-hour maximum limitation

                                      BLAST task

                                      Splitting task

                                      BLAST task

                                      BLAST task

                                      BLAST task

                                      hellip

                                      Merging Task

                                      Task size vs Performance

                                      bull Benefit of the warm cache effect

                                      bull 100 sequences per partition is the best choice

                                      Instance size vs Performance

                                      bull Super-linear speedup with larger size worker instances

                                      bull Primarily due to the memory capability

                                      Task SizeInstance Size vs Cost

                                      bull Extra-large instance generated the best and the most economical throughput

                                      bull Fully utilize the resource

                                      Web

                                      Portal

                                      Web

                                      Service

                                      Job registration

                                      Job Scheduler

                                      Worker

                                      Worker

                                      Worker

                                      Global

                                      dispatch

                                      queue

                                      Web Role

                                      Azure Table

                                      Job Management Role

                                      Azure Blob

                                      Database

                                      updating Role

                                      hellip

                                      Scaling Engine

                                      Blast databases

                                      temporary data etc)

                                      Job Registry NCBI databases

                                      BLAST task

                                      Splitting task

                                      BLAST task

                                      BLAST task

                                      BLAST task

                                      hellip

                                      Merging Task

                                      ASPNET program hosted by a web role instance bull Submit jobs

                                      bull Track jobrsquos status and logs

                                      AuthenticationAuthorization based on Live ID

                                      The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                      Web Portal

                                      Web Service

                                      Job registration

                                      Job Scheduler

                                      Job Portal

                                      Scaling Engine

                                      Job Registry

                                      R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                      Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                      bull Against ~5000 proteins from another strain completed in less than 30 sec

                                      AzureBLAST significantly saved computing timehellip

                                      Discovering Homologs bull Discover the interrelationships of known protein sequences

                                      ldquoAll against Allrdquo query bull The database is also the input query

                                      bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                      bull Theoretically 100 billion sequence comparisons

                                      Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                      bull Would require 3216731 minutes (61 years) on one desktop

                                      One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                      bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                      bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                      bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                      bull Each segment consists of smaller partitions

                                      bull When load imbalances redistribute the load manually

                                      5

                                      0

                                      62 6

                                      2 6

                                      2 6

                                      2 6

                                      2 5

                                      0 62

                                      bull Total size of the output result is ~230GB

                                      bull The number of total hits is 1764579487

                                      bull Started at March 25th the last task completed on April 8th (10 days compute)

                                      bull But based our estimates real working instance time should be 6~8 day

                                      bull Look into log data to analyze what took placehellip

                                      5

                                      0

                                      62 6

                                      2 6

                                      2 6

                                      2 6

                                      2 5

                                      0 62

                                      A normal log record should be

                                      Otherwise something is wrong (eg task failed to complete)

                                      3312010 614 RD00155D3611B0 Executing the task 251523

                                      3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                      3312010 625 RD00155D3611B0 Executing the task 251553

                                      3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                      3312010 644 RD00155D3611B0 Executing the task 251600

                                      3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                      3312010 822 RD00155D3611B0 Executing the task 251774

                                      3312010 950 RD00155D3611B0 Executing the task 251895

                                      3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                      North Europe Data Center totally 34256 tasks processed

                                      All 62 compute nodes lost tasks

                                      and then came back in a group

                                      This is an Update domain

                                      ~30 mins

                                      ~ 6 nodes in one group

                                      35 Nodes experience blob

                                      writing failure at same

                                      time

                                      West Europe Datacenter 30976 tasks are completed and job was killed

                                      A reasonable guess the

                                      Fault Domain is working

                                      MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                      You never miss the water till the well has run dry Irish Proverb

                                      ET = Water volume evapotranspired (m3 s-1 m-2)

                                      Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                      λv = Latent heat of vaporization (Jg)

                                      Rn = Net radiation (W m-2)

                                      cp = Specific heat capacity of air (J kg-1 K-1)

                                      ρa = dry air density (kg m-3)

                                      δq = vapor pressure deficit (Pa)

                                      ga = Conductivity of air (inverse of ra) (m s-1)

                                      gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                      γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                      Estimating resistanceconductivity across a

                                      catchment can be tricky

                                      bull Lots of inputs big data reduction

                                      bull Some of the inputs are not so simple

                                      119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                      (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                      Penman-Monteith (1964)

                                      Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                      NASA MODIS

                                      imagery source

                                      archives

                                      5 TB (600K files)

                                      FLUXNET curated

                                      sensor dataset

                                      (30GB 960 files)

                                      FLUXNET curated

                                      field dataset

                                      2 KB (1 file)

                                      NCEPNCAR

                                      ~100MB

                                      (4K files)

                                      Vegetative

                                      clumping

                                      ~5MB (1file)

                                      Climate

                                      classification

                                      ~1MB (1file)

                                      20 US year = 1 global year

                                      Data collection (map) stage

                                      bull Downloads requested input tiles

                                      from NASA ftp sites

                                      bull Includes geospatial lookup for

                                      non-sinusoidal tiles that will

                                      contribute to a reprojected

                                      sinusoidal tile

                                      Reprojection (map) stage

                                      bull Converts source tile(s) to

                                      intermediate result sinusoidal tiles

                                      bull Simple nearest neighbor or spline

                                      algorithms

                                      Derivation reduction stage

                                      bull First stage visible to scientist

                                      bull Computes ET in our initial use

                                      Analysis reduction stage

                                      bull Optional second stage visible to

                                      scientist

                                      bull Enables production of science

                                      analysis artifacts such as maps

                                      tables virtual sensors

                                      Reduction 1

                                      Queue

                                      Source

                                      Metadata

                                      AzureMODIS

                                      Service Web Role Portal

                                      Request

                                      Queue

                                      Scientific

                                      Results

                                      Download

                                      Data Collection Stage

                                      Source Imagery Download Sites

                                      Reprojection

                                      Queue

                                      Reduction 2

                                      Queue

                                      Download

                                      Queue

                                      Scientists

                                      Science results

                                      Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                      httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                      bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                      bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                      bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                      recoverable units of work

                                      bull Execution status of all jobs and tasks persisted in Tables

                                      ltPipelineStagegt

                                      Request

                                      hellip ltPipelineStagegtJobStatus

                                      Persist ltPipelineStagegtJob Queue

                                      MODISAzure Service

                                      (Web Role)

                                      Service Monitor

                                      (Worker Role)

                                      Parse amp Persist ltPipelineStagegtTaskStatus

                                      hellip

                                      Dispatch

                                      ltPipelineStagegtTask Queue

                                      All work actually done by a Worker Role

                                      bull Sandboxes science or other executable

                                      bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                      Service Monitor

                                      (Worker Role)

                                      Parse amp Persist ltPipelineStagegtTaskStatus

                                      GenericWorker

                                      (Worker Role)

                                      hellip

                                      hellip

                                      Dispatch

                                      ltPipelineStagegtTask Queue

                                      hellip

                                      ltInputgtData Storage

                                      bull Dequeues tasks created by the Service Monitor

                                      bull Retries failed tasks 3 times

                                      bull Maintains all task status

                                      Reprojection Request

                                      hellip

                                      Service Monitor

                                      (Worker Role)

                                      ReprojectionJobStatus Persist

                                      Parse amp Persist ReprojectionTaskStatus

                                      GenericWorker

                                      (Worker Role)

                                      hellip

                                      Job Queue

                                      hellip

                                      Dispatch

                                      Task Queue

                                      Points to

                                      hellip

                                      ScanTimeList

                                      SwathGranuleMeta

                                      Reprojection Data

                                      Storage

                                      Each entity specifies a

                                      single reprojection job

                                      request

                                      Each entity specifies a

                                      single reprojection task (ie

                                      a single tile)

                                      Query this table to get

                                      geo-metadata (eg

                                      boundaries) for each swath

                                      tile

                                      Query this table to get the

                                      list of satellite scan times

                                      that cover a target tile

                                      Swath Source

                                      Data Storage

                                      bull Computational costs driven by data scale and need to run reduction multiple times

                                      bull Storage costs driven by data scale and 6 month project duration

                                      bull Small with respect to the people costs even at graduate student rates

                                      Reduction 1 Queue

                                      Source Metadata

                                      Request Queue

                                      Scientific Results Download

                                      Data Collection Stage

                                      Source Imagery Download Sites

                                      Reprojection Queue

                                      Reduction 2 Queue

                                      Download

                                      Queue

                                      Scientists

                                      Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                      400-500 GB

                                      60K files

                                      10 MBsec

                                      11 hours

                                      lt10 workers

                                      $50 upload

                                      $450 storage

                                      400 GB

                                      45K files

                                      3500 hours

                                      20-100

                                      workers

                                      5-7 GB

                                      55K files

                                      1800 hours

                                      20-100

                                      workers

                                      lt10 GB

                                      ~1K files

                                      1800 hours

                                      20-100

                                      workers

                                      $420 cpu

                                      $60 download

                                      $216 cpu

                                      $1 download

                                      $6 storage

                                      $216 cpu

                                      $2 download

                                      $9 storage

                                      AzureMODIS

                                      Service Web Role Portal

                                      Total $1420

                                      bull Clouds are the largest scale computer centers ever constructed and have the

                                      potential to be important to both large and small scale science problems

                                      bull Equally import they can increase participation in research providing needed

                                      resources to userscommunities without ready access

                                      bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                      support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                      latency applications do not perform optimally on clouds today

                                      bull Provide valuable fault tolerance and scalability abstractions

                                      bull Clouds as amplifier for familiar client tools and on premise compute

                                      bull Clouds services to support research provide considerable leverage for both

                                      individual researchers and entire communities of researchers

                                      • Day 2 - Azure as a PaaS
                                      • Day 2 - Applications

                                        Service Definition

                                        Service Configuration

                                        GUI

                                        Double click on Role Name in Azure Project

                                        Deploying to the cloud

                                        bull We can deploy from the portal or from script

                                        bull VS builds two files

                                        bull Encrypted package of your code

                                        bull Your config file

                                        bull You must create an Azure account then a service and then you deploy your code

                                        bull Can take up to 20 minutes

                                        bull (which is better than six months)

                                        Service Management API

                                        bullREST based API to manage your services

                                        bullX509-certs for authentication

                                        bullLets you create delete change upgrade swaphellip

                                        bullLots of community and MSFT-built tools around the API

                                        - Easy to roll your own

                                        The Secret Sauce ndash The Fabric

                                        The Fabric is the lsquobrainrsquo behind Windows Azure

                                        1 Process service model

                                        1 Determine resource requirements

                                        2 Create role images

                                        2 Allocate resources

                                        3 Prepare nodes

                                        1 Place role images on nodes

                                        2 Configure settings

                                        3 Start roles

                                        4 Configure load balancers

                                        5 Maintain service health

                                        1 If role fails restart the role based on policy

                                        2 If node fails migrate the role based on policy

                                        Storage

                                        Durable Storage At Massive Scale

                                        Blob

                                        - Massive files eg videos logs

                                        Drive

                                        - Use standard file system APIs

                                        Tables - Non-relational but with few scale limits

                                        - Use SQL Azure for relational data

                                        Queues

                                        - Facilitate loosely-coupled reliable systems

                                        Blob Features and Functions

                                        bull Store Large Objects (up to 1TB in size)

                                        bull Can be served through Windows Azure CDN service

                                        bull Standard REST Interface

                                        bull PutBlob

                                        bull Inserts a new blob overwrites the existing blob

                                        bull GetBlob

                                        bull Get whole blob or a specific range

                                        bull DeleteBlob

                                        bull CopyBlob

                                        bull SnapshotBlob

                                        bull LeaseBlob

                                        Two Types of Blobs Under the Hood

                                        bull Block Blob

                                        bull Targeted at streaming workloads

                                        bull Each blob consists of a sequence of blocks bull Each block is identified by a Block ID

                                        bull Size limit 200GB per blob

                                        bull Page Blob

                                        bull Targeted at random readwrite workloads

                                        bull Each blob consists of an array of pages bull Each page is identified by its offset

                                        from the start of the blob

                                        bull Size limit 1TB per blob

                                        Windows Azure Drive

                                        bull Provides a durable NTFS volume for Windows Azure applications to use

                                        bull Use existing NTFS APIs to access a durable drive bull Durability and survival of data on application failover

                                        bull Enables migrating existing NTFS applications to the cloud

                                        bull A Windows Azure Drive is a Page Blob

                                        bull Example mount Page Blob as X bull httpltaccountnamegtblobcorewindowsnetltcontainernamegtltblobnamegt

                                        bull All writes to drive are made durable to the Page Blob bull Drive made durable through standard Page Blob replication

                                        bull Drive persists even when not mounted as a Page Blob

                                        Windows Azure Tables

                                        bull Provides Structured Storage

                                        bull Massively Scalable Tables bull Billions of entities (rows) and TBs of data

                                        bull Can use thousands of servers as traffic grows

                                        bull Highly Available amp Durable bull Data is replicated several times

                                        bull Familiar and Easy to use API

                                        bull WCF Data Services and OData bull NET classes and LINQ

                                        bull REST ndash with any platform or language

                                        Windows Azure Queues

                                        bull Queue are performance efficient highly available and provide reliable message delivery

                                        bull Simple asynchronous work dispatch

                                        bull Programming semantics ensure that a message can be processed at least once

                                        bull Access is provided via REST

                                        Storage Partitioning

                                        Understanding partitioning is key to understanding performance

                                        bull Different for each data type (blobs entities queues) Every data object has a partition key

                                        bull A partition can be served by a single server

                                        bull System load balances partitions based on traffic pattern

                                        bull Controls entity locality Partition key is unit of scale

                                        bull Load balancing can take a few minutes to kick in

                                        bull Can take a couple of seconds for partition to be available on a different server

                                        System load balances

                                        bull Use exponential backoff on ldquoServer Busyrdquo

                                        bull Our system load balances to meet your traffic needs

                                        bull Single partition limits have been reached Server Busy

                                        Partition Keys In Each Abstraction

                                        bull Entities w same PartitionKey value served from same partition Entities ndash TableName + PartitionKey

                                        PartitionKey (CustomerId) RowKey (RowKind)

                                        Name CreditCardNumber OrderTotal

                                        1 Customer-John Smith John Smith xxxx-xxxx-xxxx-xxxx

                                        1 Order ndash 1 $3512

                                        2 Customer-Bill Johnson Bill Johnson xxxx-xxxx-xxxx-xxxx

                                        2 Order ndash 3 $1000

                                        bull Every blob and its snapshots are in a single partition Blobs ndash Container name + Blob name

                                        bullAll messages for a single queue belong to the same partition Messages ndash Queue Name

                                        Container Name Blob Name

                                        image annarborbighousejpg

                                        image foxboroughgillettejpg

                                        video annarborbighousejpg

                                        Queue Message

                                        jobs Message1

                                        jobs Message2

                                        workflow Message1

                                        Scalability Targets

                                        Storage Account

                                        bull Capacity ndash Up to 100 TBs

                                        bull Transactions ndash Up to a few thousand requests per second

                                        bull Bandwidth ndash Up to a few hundred megabytes per second

                                        Single QueueTable Partition

                                        bull Up to 500 transactions per second

                                        To go above these numbers partition between multiple storage accounts and partitions

                                        When limit is hit app will see lsquo503 server busyrsquo applications should implement exponential backoff

                                        Single Blob Partition

                                        bull Throughput up to 60 MBs

                                        PartitionKey (Category)

                                        RowKey (Title)

                                        Timestamp ReleaseDate

                                        Action Fast amp Furious hellip 2009

                                        Action The Bourne Ultimatum hellip 2007

                                        hellip hellip hellip hellip

                                        Animation Open Season 2 hellip 2009

                                        Animation The Ant Bully hellip 2006

                                        PartitionKey (Category)

                                        RowKey (Title)

                                        Timestamp ReleaseDate

                                        Comedy Office Space hellip 1999

                                        hellip hellip hellip hellip

                                        SciFi X-Men Origins Wolverine hellip 2009

                                        hellip hellip hellip hellip

                                        War Defiance hellip 2008

                                        PartitionKey (Category)

                                        RowKey (Title)

                                        Timestamp ReleaseDate

                                        Action Fast amp Furious hellip 2009

                                        Action The Bourne Ultimatum hellip 2007

                                        hellip hellip hellip hellip

                                        Animation Open Season 2 hellip 2009

                                        Animation The Ant Bully hellip 2006

                                        hellip hellip hellip hellip

                                        Comedy Office Space hellip 1999

                                        hellip hellip hellip hellip

                                        SciFi X-Men Origins Wolverine hellip 2009

                                        hellip hellip hellip hellip

                                        War Defiance hellip 2008

                                        Partitions and Partition Ranges

                                        Key Selection Things to Consider

                                        bullDistribute load as much as possible bullHot partitions can be load balanced bullPartitionKey is critical for scalability

                                        See httpwwwmicrosoftpdccom2009SVC09 and httpazurescopecloudappnet for more information

                                        bull Avoid frequent large scans bull Parallelize queries bull Point queries are most efficient

                                        bullTransactions across a single partition bullTransaction semantics amp Reduce round trips

                                        Scalability

                                        Query Efficiency amp Speed

                                        Entity group transactions

                                        Expect Continuation Tokens ndash Seriously

                                        Maximum of 1000 rows in a response

                                        At the end of partition range boundary

                                        Maximum of 1000 rows in a response

                                        At the end of partition range boundary

                                        Maximum of 5 seconds to execute the query

                                        Tables Recap bullEfficient for frequently used queries

                                        bullSupports batch transactions

                                        bullDistributes load

                                        Select PartitionKey and RowKey that help scale

                                        Avoid ldquoAppend onlyrdquo patterns

                                        Always Handle continuation tokens

                                        ldquoORrdquo predicates are not optimized

                                        Implement back-off strategy for retries

                                        bullDistribute by using a hash etc as prefix

                                        bullExpect continuation tokens for range queries

                                        bullExecute the queries that form the ldquoORrdquo predicates as separate queries

                                        bullServer busy

                                        bullLoad balance partitions to meet traffic needs

                                        bullLoad on single partition has exceeded the limits

                                        WCF Data Services

                                        bullUse a new context for each logical operation

                                        bullAddObjectAttachTo can throw exception if entity is already being tracked

                                        bullPoint query throws an exception if resource does not exist Use IgnoreResourceNotFoundException

                                        Queues Their Unique Role in Building Reliable Scalable Applications

                                        bull Want roles that work closely together but are not bound together bull Tight coupling leads to brittleness

                                        bull This can aid in scaling and performance

                                        bull A queue can hold an unlimited number of messages bull Messages must be serializable as XML

                                        bull Limited to 8KB in size

                                        bull Commonly use the work ticket pattern

                                        bull Why not simply use a table

                                        Queue Terminology

                                        Message Lifecycle

                                        Queue

                                        Msg 1

                                        Msg 2

                                        Msg 3

                                        Msg 4

                                        Worker Role

                                        Worker Role

                                        PutMessage

                                        Web Role

                                        GetMessage (Timeout) RemoveMessage

                                        Msg 2 Msg 1

                                        Worker Role

                                        Msg 2

                                        POST httpmyaccountqueuecorewindowsnetmyqueuemessages

                                        HTTP11 200 OK Transfer-Encoding chunked Content-Type applicationxml Date Tue 09 Dec 2008 210430 GMT Server Nephos Queue Service Version 10 Microsoft-HTTPAPI20

                                        ltxml version=10 encoding=utf-8gt ltQueueMessagesListgt ltQueueMessagegt ltMessageIdgt5974b586-0df3-4e2d-ad0c-18e3892bfca2ltMessageIdgt ltInsertionTimegtMon 22 Sep 2008 232920 GMTltInsertionTimegt ltExpirationTimegtMon 29 Sep 2008 232920 GMTltExpirationTimegt ltPopReceiptgtYzQ4Yzg1MDIGM0MDFiZDAwYzEwltPopReceiptgt ltTimeNextVisiblegtTue 23 Sep 2008 052920GMTltTimeNextVisiblegt ltMessageTextgtPHRlc3Q+dGdGVzdD4=ltMessageTextgt ltQueueMessagegt ltQueueMessagesListgt

                                        DELETE httpmyaccountqueuecorewindowsnetmyqueuemessagesmessageidpopreceipt=YzQ4Yzg1MDIGM0MDFiZDAwYzEw

                                        Truncated Exponential Back Off Polling

                                        Consider a backoff polling approach Each empty poll

                                        increases interval by 2x

                                        A successful sets the interval back to 1

                                        44

                                        2 1

                                        1 1

                                        C1

                                        C2

                                        Removing Poison Messages

                                        1 1

                                        2 1

                                        3 4 0

                                        Producers Consumers

                                        P2

                                        P1

                                        3 0

                                        2 GetMessage(Q 30 s) msg 2

                                        1 GetMessage(Q 30 s) msg 1

                                        1 1

                                        2 1

                                        1 0

                                        2 0

                                        45

                                        C1

                                        C2

                                        Removing Poison Messages

                                        3 4 0

                                        Producers Consumers

                                        P2

                                        P1

                                        1 1

                                        2 1

                                        2 GetMessage(Q 30 s) msg 2 3 C2 consumed msg 2 4 DeleteMessage(Q msg 2) 7 GetMessage(Q 30 s) msg 1

                                        1 GetMessage(Q 30 s) msg 1 5 C1 crashed

                                        1 1

                                        2 1

                                        6 msg1 visible 30 s after Dequeue 3 0

                                        1 2

                                        1 1

                                        1 2

                                        46

                                        C1

                                        C2

                                        Removing Poison Messages

                                        3 4 0

                                        Producers Consumers

                                        P2

                                        P1

                                        1 2

                                        2 Dequeue(Q 30 sec) msg 2 3 C2 consumed msg 2 4 Delete(Q msg 2) 7 Dequeue(Q 30 sec) msg 1 8 C2 crashed

                                        1 Dequeue(Q 30 sec) msg 1 5 C1 crashed 10 C1 restarted 11 Dequeue(Q 30 sec) msg 1 12 DequeueCount gt 2 13 Delete (Q msg1) 1

                                        2

                                        6 msg1 visible 30s after Dequeue 9 msg1 visible 30s after Dequeue

                                        3 0

                                        1 3

                                        1 2

                                        1 3

                                        Queues Recap

                                        bullNo need to deal with failures Make message

                                        processing idempotent

                                        bullInvisible messages result in out of order Do not rely on order

                                        bullEnforce threshold on messagersquos dequeue count Use Dequeue count to remove

                                        poison messages

                                        bullMessages gt 8KB

                                        bullBatch messages

                                        bullGarbage collect orphaned blobs

                                        bullDynamically increasereduce workers

                                        Use blob to store message data with

                                        reference in message

                                        Use message count to scale

                                        bullNo need to deal with failures

                                        bullInvisible messages result in out of order

                                        bullEnforce threshold on messagersquos dequeue count

                                        bullDynamically increasereduce workers

                                        Windows Azure Storage Takeaways

                                        Blobs

                                        Drives

                                        Tables

                                        Queues

                                        httpblogsmsdncomwindowsazurestorage

                                        httpazurescopecloudappnet

                                        49

                                        A Quick Exercise

                                        hellipThen letrsquos look at some code and some tools

                                        50

                                        Code ndash AccountInformationcs public class AccountInformation private static string storageKey = ldquotHiSiSnOtMyKeY private static string accountName = jjstore private static StorageCredentialsAccountAndKey credentials internal static StorageCredentialsAccountAndKey Credentials get if (credentials == null) credentials = new StorageCredentialsAccountAndKey(accountName storageKey) return credentials

                                        51

                                        Code ndash BlobHelpercs public class BlobHelper private static string defaultContainerName = school private CloudBlobClient client = null private CloudBlobContainer container = null private void InitContainer() if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudBlobClient() container = clientGetContainerReference(defaultContainerName) containerCreateIfNotExist() BlobContainerPermissions permissions = containerGetPermissions() permissionsPublicAccess = BlobContainerPublicAccessTypeContainer containerSetPermissions(permissions)

                                        52

                                        Code ndash BlobHelpercs

                                        public void WriteFileToBlob(string filePath) if (client == null || container == null) InitContainer() FileInfo file = new FileInfo(filePath) CloudBlob blob = containerGetBlobReference(fileName) blobPropertiesContentType = GetContentType(fileExtension) blobUploadFile(fileFullName) Or if you want to write a string replace the last line with blobUploadText(someString) And make sure you set the content type to the appropriate MIME type (eg ldquotextplainrdquo)

                                        53

                                        Code ndash BlobHelpercs

                                        public string GetBlobText(string blobName) if (client == null || container == null) InitContainer() CloudBlob blob = containerGetBlobReference(blobName) try return blobDownloadText() catch (Exception) The blob probably does not exist or there is no connection available return null

                                        54

                                        Application Code - Blobs private void SaveToCloudButton_Click(object sender RoutedEventArgs e) StringBuilder buff = new StringBuilder() buffAppendLine(LastNameFirstNameEmailBirthdayNativeLanguageFavoriteIceCreamYearsInPhDGraduated) foreach (AttendeeEntity attendee in attendees) buffAppendLine(attendeeToCsvString()) blobHelperWriteStringToBlob(SummerSchoolAttendeestxt buffToString())

                                        The blob is now available at httpltAccountNamegtblobcorewindowsnetltContainerNamegtltBlobNamegt Or in this case httpjjstoreblobcorewindowsnetschoolSummerSchoolAttendeestxt

                                        55

                                        Code - TableEntities using MicrosoftWindowsAzureStorageClient public class AttendeeEntity TableServiceEntity public string FirstName get set public string LastName get set public string Email get set public DateTime Birthday get set public string FavoriteIceCream get set public int YearsInPhD get set public bool Graduated get set hellip

                                        56

                                        Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

                                        57

                                        Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

                                        58

                                        Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

                                        59

                                        Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

                                        60

                                        Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

                                        Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

                                        61

                                        Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

                                        62

                                        Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

                                        Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

                                        63

                                        Tools ndash Fiddler2

                                        Best Practices

                                        Picking the Right VM Size

                                        bull Having the correct VM size can make a big difference in costs

                                        bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                                        bull If you scale better than linear across cores larger VMs could save you money

                                        bull Pretty rare to see linear scaling across 8 cores

                                        bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                                        bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                                        Using Your VM to the Maximum

                                        Remember bull 1 role instance == 1 VM running Windows

                                        bull 1 role instance = one specific task for your code

                                        bull Yoursquore paying for the entire VM so why not use it

                                        bull Common mistake ndash split up code into multiple roles each not using up CPU

                                        bull Balance between using up CPU vs having free capacity in times of need

                                        bull Multiple ways to use your CPU to the fullest

                                        Exploiting Concurrency

                                        bull Spin up additional processes each with a specific task or as a unit of concurrency

                                        bull May not be ideal if number of active processes exceeds number of cores

                                        bull Use multithreading aggressively

                                        bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                                        bull In NET 4 use the Task Parallel Library

                                        bull Data parallelism

                                        bull Task parallelism

                                        Finding Good Code Neighbors

                                        bull Typically code falls into one or more of these categories

                                        bull Find code that is intensive with different resources to live together

                                        bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                                        Memory Intensive

                                        CPU Intensive

                                        Network IO Intensive

                                        Storage IO Intensive

                                        Scaling Appropriately

                                        bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                                        bull Spinning VMs up and down automatically is good at large scale

                                        bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                                        bull Being too aggressive in spinning down VMs can result in poor user experience

                                        bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                                        Performance Cost

                                        Storage Costs

                                        bull Understand an applicationrsquos storage profile and how storage billing works

                                        bull Make service choices based on your app profile

                                        bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                                        bull Service choice can make a big cost difference based on your app profile

                                        bull Caching and compressing They help a lot with storage costs

                                        Saving Bandwidth Costs

                                        Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                        Sending fewer things over the wire often means getting fewer things from storage

                                        Saving bandwidth costs often lead to savings in other places

                                        Sending fewer things means your VM has time to do other tasks

                                        All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                        Compressing Content

                                        1 Gzip all output content

                                        bull All modern browsers can decompress on the fly

                                        bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                        2Tradeoff compute costs for storage size

                                        3Minimize image sizes

                                        bull Use Portable Network Graphics (PNGs)

                                        bull Crush your PNGs

                                        bull Strip needless metadata

                                        bull Make all PNGs palette PNGs

                                        Uncompressed

                                        Content

                                        Compressed

                                        Content

                                        Gzip

                                        Minify JavaScript

                                        Minify CCS

                                        Minify Images

                                        Best Practices Summary

                                        Doing lsquolessrsquo is the key to saving costs

                                        Measure everything

                                        Know your application profile in and out

                                        Research Examples in the Cloud

                                        hellipon another set of slides

                                        Map Reduce on Azure

                                        bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                        bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                        76

                                        Project Daytona - Map Reduce on Azure

                                        httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                        77

                                        Questions and Discussionhellip

                                        Thank you for hosting me at the Summer School

                                        BLAST (Basic Local Alignment Search Tool)

                                        bull The most important software in bioinformatics

                                        bull Identify similarity between bio-sequences

                                        Computationally intensive

                                        bull Large number of pairwise alignment operations

                                        bull A BLAST running can take 700 ~ 1000 CPU hours

                                        bull Sequence databases growing exponentially

                                        bull GenBank doubled in size in about 15 months

                                        It is easy to parallelize BLAST

                                        bull Segment the input

                                        bull Segment processing (querying) is pleasingly parallel

                                        bull Segment the database (eg mpiBLAST)

                                        bull Needs special result reduction processing

                                        Large volume data

                                        bull A normal Blast database can be as large as 10GB

                                        bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                        bull The output of BLAST is usually 10-100x larger than the input

                                        bull Parallel BLAST engine on Azure

                                        bull Query-segmentation data-parallel pattern bull split the input sequences

                                        bull query partitions in parallel

                                        bull merge results together when done

                                        bull Follows the general suggested application model bull Web Role + Queue + Worker

                                        bull With three special considerations bull Batch job management

                                        bull Task parallelism on an elastic Cloud

                                        Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                        on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                        A simple SplitJoin pattern

                                        Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                        bull 1248 for small middle large and extra large instance size

                                        Task granularity bull Large partition load imbalance

                                        bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                        bull Data transferring overhead

                                        Best Practice test runs to profiling and set size to mitigate the overhead

                                        Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                        bull too small repeated computation

                                        bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                        bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                        bull Watch out for the 2-hour maximum limitation

                                        BLAST task

                                        Splitting task

                                        BLAST task

                                        BLAST task

                                        BLAST task

                                        hellip

                                        Merging Task

                                        Task size vs Performance

                                        bull Benefit of the warm cache effect

                                        bull 100 sequences per partition is the best choice

                                        Instance size vs Performance

                                        bull Super-linear speedup with larger size worker instances

                                        bull Primarily due to the memory capability

                                        Task SizeInstance Size vs Cost

                                        bull Extra-large instance generated the best and the most economical throughput

                                        bull Fully utilize the resource

                                        Web

                                        Portal

                                        Web

                                        Service

                                        Job registration

                                        Job Scheduler

                                        Worker

                                        Worker

                                        Worker

                                        Global

                                        dispatch

                                        queue

                                        Web Role

                                        Azure Table

                                        Job Management Role

                                        Azure Blob

                                        Database

                                        updating Role

                                        hellip

                                        Scaling Engine

                                        Blast databases

                                        temporary data etc)

                                        Job Registry NCBI databases

                                        BLAST task

                                        Splitting task

                                        BLAST task

                                        BLAST task

                                        BLAST task

                                        hellip

                                        Merging Task

                                        ASPNET program hosted by a web role instance bull Submit jobs

                                        bull Track jobrsquos status and logs

                                        AuthenticationAuthorization based on Live ID

                                        The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                        Web Portal

                                        Web Service

                                        Job registration

                                        Job Scheduler

                                        Job Portal

                                        Scaling Engine

                                        Job Registry

                                        R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                        Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                        bull Against ~5000 proteins from another strain completed in less than 30 sec

                                        AzureBLAST significantly saved computing timehellip

                                        Discovering Homologs bull Discover the interrelationships of known protein sequences

                                        ldquoAll against Allrdquo query bull The database is also the input query

                                        bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                        bull Theoretically 100 billion sequence comparisons

                                        Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                        bull Would require 3216731 minutes (61 years) on one desktop

                                        One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                        bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                        bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                        bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                        bull Each segment consists of smaller partitions

                                        bull When load imbalances redistribute the load manually

                                        5

                                        0

                                        62 6

                                        2 6

                                        2 6

                                        2 6

                                        2 5

                                        0 62

                                        bull Total size of the output result is ~230GB

                                        bull The number of total hits is 1764579487

                                        bull Started at March 25th the last task completed on April 8th (10 days compute)

                                        bull But based our estimates real working instance time should be 6~8 day

                                        bull Look into log data to analyze what took placehellip

                                        5

                                        0

                                        62 6

                                        2 6

                                        2 6

                                        2 6

                                        2 5

                                        0 62

                                        A normal log record should be

                                        Otherwise something is wrong (eg task failed to complete)

                                        3312010 614 RD00155D3611B0 Executing the task 251523

                                        3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                        3312010 625 RD00155D3611B0 Executing the task 251553

                                        3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                        3312010 644 RD00155D3611B0 Executing the task 251600

                                        3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                        3312010 822 RD00155D3611B0 Executing the task 251774

                                        3312010 950 RD00155D3611B0 Executing the task 251895

                                        3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                        North Europe Data Center totally 34256 tasks processed

                                        All 62 compute nodes lost tasks

                                        and then came back in a group

                                        This is an Update domain

                                        ~30 mins

                                        ~ 6 nodes in one group

                                        35 Nodes experience blob

                                        writing failure at same

                                        time

                                        West Europe Datacenter 30976 tasks are completed and job was killed

                                        A reasonable guess the

                                        Fault Domain is working

                                        MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                        You never miss the water till the well has run dry Irish Proverb

                                        ET = Water volume evapotranspired (m3 s-1 m-2)

                                        Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                        λv = Latent heat of vaporization (Jg)

                                        Rn = Net radiation (W m-2)

                                        cp = Specific heat capacity of air (J kg-1 K-1)

                                        ρa = dry air density (kg m-3)

                                        δq = vapor pressure deficit (Pa)

                                        ga = Conductivity of air (inverse of ra) (m s-1)

                                        gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                        γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                        Estimating resistanceconductivity across a

                                        catchment can be tricky

                                        bull Lots of inputs big data reduction

                                        bull Some of the inputs are not so simple

                                        119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                        (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                        Penman-Monteith (1964)

                                        Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                        NASA MODIS

                                        imagery source

                                        archives

                                        5 TB (600K files)

                                        FLUXNET curated

                                        sensor dataset

                                        (30GB 960 files)

                                        FLUXNET curated

                                        field dataset

                                        2 KB (1 file)

                                        NCEPNCAR

                                        ~100MB

                                        (4K files)

                                        Vegetative

                                        clumping

                                        ~5MB (1file)

                                        Climate

                                        classification

                                        ~1MB (1file)

                                        20 US year = 1 global year

                                        Data collection (map) stage

                                        bull Downloads requested input tiles

                                        from NASA ftp sites

                                        bull Includes geospatial lookup for

                                        non-sinusoidal tiles that will

                                        contribute to a reprojected

                                        sinusoidal tile

                                        Reprojection (map) stage

                                        bull Converts source tile(s) to

                                        intermediate result sinusoidal tiles

                                        bull Simple nearest neighbor or spline

                                        algorithms

                                        Derivation reduction stage

                                        bull First stage visible to scientist

                                        bull Computes ET in our initial use

                                        Analysis reduction stage

                                        bull Optional second stage visible to

                                        scientist

                                        bull Enables production of science

                                        analysis artifacts such as maps

                                        tables virtual sensors

                                        Reduction 1

                                        Queue

                                        Source

                                        Metadata

                                        AzureMODIS

                                        Service Web Role Portal

                                        Request

                                        Queue

                                        Scientific

                                        Results

                                        Download

                                        Data Collection Stage

                                        Source Imagery Download Sites

                                        Reprojection

                                        Queue

                                        Reduction 2

                                        Queue

                                        Download

                                        Queue

                                        Scientists

                                        Science results

                                        Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                        httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                        bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                        bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                        bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                        recoverable units of work

                                        bull Execution status of all jobs and tasks persisted in Tables

                                        ltPipelineStagegt

                                        Request

                                        hellip ltPipelineStagegtJobStatus

                                        Persist ltPipelineStagegtJob Queue

                                        MODISAzure Service

                                        (Web Role)

                                        Service Monitor

                                        (Worker Role)

                                        Parse amp Persist ltPipelineStagegtTaskStatus

                                        hellip

                                        Dispatch

                                        ltPipelineStagegtTask Queue

                                        All work actually done by a Worker Role

                                        bull Sandboxes science or other executable

                                        bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                        Service Monitor

                                        (Worker Role)

                                        Parse amp Persist ltPipelineStagegtTaskStatus

                                        GenericWorker

                                        (Worker Role)

                                        hellip

                                        hellip

                                        Dispatch

                                        ltPipelineStagegtTask Queue

                                        hellip

                                        ltInputgtData Storage

                                        bull Dequeues tasks created by the Service Monitor

                                        bull Retries failed tasks 3 times

                                        bull Maintains all task status

                                        Reprojection Request

                                        hellip

                                        Service Monitor

                                        (Worker Role)

                                        ReprojectionJobStatus Persist

                                        Parse amp Persist ReprojectionTaskStatus

                                        GenericWorker

                                        (Worker Role)

                                        hellip

                                        Job Queue

                                        hellip

                                        Dispatch

                                        Task Queue

                                        Points to

                                        hellip

                                        ScanTimeList

                                        SwathGranuleMeta

                                        Reprojection Data

                                        Storage

                                        Each entity specifies a

                                        single reprojection job

                                        request

                                        Each entity specifies a

                                        single reprojection task (ie

                                        a single tile)

                                        Query this table to get

                                        geo-metadata (eg

                                        boundaries) for each swath

                                        tile

                                        Query this table to get the

                                        list of satellite scan times

                                        that cover a target tile

                                        Swath Source

                                        Data Storage

                                        bull Computational costs driven by data scale and need to run reduction multiple times

                                        bull Storage costs driven by data scale and 6 month project duration

                                        bull Small with respect to the people costs even at graduate student rates

                                        Reduction 1 Queue

                                        Source Metadata

                                        Request Queue

                                        Scientific Results Download

                                        Data Collection Stage

                                        Source Imagery Download Sites

                                        Reprojection Queue

                                        Reduction 2 Queue

                                        Download

                                        Queue

                                        Scientists

                                        Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                        400-500 GB

                                        60K files

                                        10 MBsec

                                        11 hours

                                        lt10 workers

                                        $50 upload

                                        $450 storage

                                        400 GB

                                        45K files

                                        3500 hours

                                        20-100

                                        workers

                                        5-7 GB

                                        55K files

                                        1800 hours

                                        20-100

                                        workers

                                        lt10 GB

                                        ~1K files

                                        1800 hours

                                        20-100

                                        workers

                                        $420 cpu

                                        $60 download

                                        $216 cpu

                                        $1 download

                                        $6 storage

                                        $216 cpu

                                        $2 download

                                        $9 storage

                                        AzureMODIS

                                        Service Web Role Portal

                                        Total $1420

                                        bull Clouds are the largest scale computer centers ever constructed and have the

                                        potential to be important to both large and small scale science problems

                                        bull Equally import they can increase participation in research providing needed

                                        resources to userscommunities without ready access

                                        bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                        support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                        latency applications do not perform optimally on clouds today

                                        bull Provide valuable fault tolerance and scalability abstractions

                                        bull Clouds as amplifier for familiar client tools and on premise compute

                                        bull Clouds services to support research provide considerable leverage for both

                                        individual researchers and entire communities of researchers

                                        • Day 2 - Azure as a PaaS
                                        • Day 2 - Applications

                                          Service Configuration

                                          GUI

                                          Double click on Role Name in Azure Project

                                          Deploying to the cloud

                                          bull We can deploy from the portal or from script

                                          bull VS builds two files

                                          bull Encrypted package of your code

                                          bull Your config file

                                          bull You must create an Azure account then a service and then you deploy your code

                                          bull Can take up to 20 minutes

                                          bull (which is better than six months)

                                          Service Management API

                                          bullREST based API to manage your services

                                          bullX509-certs for authentication

                                          bullLets you create delete change upgrade swaphellip

                                          bullLots of community and MSFT-built tools around the API

                                          - Easy to roll your own

                                          The Secret Sauce ndash The Fabric

                                          The Fabric is the lsquobrainrsquo behind Windows Azure

                                          1 Process service model

                                          1 Determine resource requirements

                                          2 Create role images

                                          2 Allocate resources

                                          3 Prepare nodes

                                          1 Place role images on nodes

                                          2 Configure settings

                                          3 Start roles

                                          4 Configure load balancers

                                          5 Maintain service health

                                          1 If role fails restart the role based on policy

                                          2 If node fails migrate the role based on policy

                                          Storage

                                          Durable Storage At Massive Scale

                                          Blob

                                          - Massive files eg videos logs

                                          Drive

                                          - Use standard file system APIs

                                          Tables - Non-relational but with few scale limits

                                          - Use SQL Azure for relational data

                                          Queues

                                          - Facilitate loosely-coupled reliable systems

                                          Blob Features and Functions

                                          bull Store Large Objects (up to 1TB in size)

                                          bull Can be served through Windows Azure CDN service

                                          bull Standard REST Interface

                                          bull PutBlob

                                          bull Inserts a new blob overwrites the existing blob

                                          bull GetBlob

                                          bull Get whole blob or a specific range

                                          bull DeleteBlob

                                          bull CopyBlob

                                          bull SnapshotBlob

                                          bull LeaseBlob

                                          Two Types of Blobs Under the Hood

                                          bull Block Blob

                                          bull Targeted at streaming workloads

                                          bull Each blob consists of a sequence of blocks bull Each block is identified by a Block ID

                                          bull Size limit 200GB per blob

                                          bull Page Blob

                                          bull Targeted at random readwrite workloads

                                          bull Each blob consists of an array of pages bull Each page is identified by its offset

                                          from the start of the blob

                                          bull Size limit 1TB per blob

                                          Windows Azure Drive

                                          bull Provides a durable NTFS volume for Windows Azure applications to use

                                          bull Use existing NTFS APIs to access a durable drive bull Durability and survival of data on application failover

                                          bull Enables migrating existing NTFS applications to the cloud

                                          bull A Windows Azure Drive is a Page Blob

                                          bull Example mount Page Blob as X bull httpltaccountnamegtblobcorewindowsnetltcontainernamegtltblobnamegt

                                          bull All writes to drive are made durable to the Page Blob bull Drive made durable through standard Page Blob replication

                                          bull Drive persists even when not mounted as a Page Blob

                                          Windows Azure Tables

                                          bull Provides Structured Storage

                                          bull Massively Scalable Tables bull Billions of entities (rows) and TBs of data

                                          bull Can use thousands of servers as traffic grows

                                          bull Highly Available amp Durable bull Data is replicated several times

                                          bull Familiar and Easy to use API

                                          bull WCF Data Services and OData bull NET classes and LINQ

                                          bull REST ndash with any platform or language

                                          Windows Azure Queues

                                          bull Queue are performance efficient highly available and provide reliable message delivery

                                          bull Simple asynchronous work dispatch

                                          bull Programming semantics ensure that a message can be processed at least once

                                          bull Access is provided via REST

                                          Storage Partitioning

                                          Understanding partitioning is key to understanding performance

                                          bull Different for each data type (blobs entities queues) Every data object has a partition key

                                          bull A partition can be served by a single server

                                          bull System load balances partitions based on traffic pattern

                                          bull Controls entity locality Partition key is unit of scale

                                          bull Load balancing can take a few minutes to kick in

                                          bull Can take a couple of seconds for partition to be available on a different server

                                          System load balances

                                          bull Use exponential backoff on ldquoServer Busyrdquo

                                          bull Our system load balances to meet your traffic needs

                                          bull Single partition limits have been reached Server Busy

                                          Partition Keys In Each Abstraction

                                          bull Entities w same PartitionKey value served from same partition Entities ndash TableName + PartitionKey

                                          PartitionKey (CustomerId) RowKey (RowKind)

                                          Name CreditCardNumber OrderTotal

                                          1 Customer-John Smith John Smith xxxx-xxxx-xxxx-xxxx

                                          1 Order ndash 1 $3512

                                          2 Customer-Bill Johnson Bill Johnson xxxx-xxxx-xxxx-xxxx

                                          2 Order ndash 3 $1000

                                          bull Every blob and its snapshots are in a single partition Blobs ndash Container name + Blob name

                                          bullAll messages for a single queue belong to the same partition Messages ndash Queue Name

                                          Container Name Blob Name

                                          image annarborbighousejpg

                                          image foxboroughgillettejpg

                                          video annarborbighousejpg

                                          Queue Message

                                          jobs Message1

                                          jobs Message2

                                          workflow Message1

                                          Scalability Targets

                                          Storage Account

                                          bull Capacity ndash Up to 100 TBs

                                          bull Transactions ndash Up to a few thousand requests per second

                                          bull Bandwidth ndash Up to a few hundred megabytes per second

                                          Single QueueTable Partition

                                          bull Up to 500 transactions per second

                                          To go above these numbers partition between multiple storage accounts and partitions

                                          When limit is hit app will see lsquo503 server busyrsquo applications should implement exponential backoff

                                          Single Blob Partition

                                          bull Throughput up to 60 MBs

                                          PartitionKey (Category)

                                          RowKey (Title)

                                          Timestamp ReleaseDate

                                          Action Fast amp Furious hellip 2009

                                          Action The Bourne Ultimatum hellip 2007

                                          hellip hellip hellip hellip

                                          Animation Open Season 2 hellip 2009

                                          Animation The Ant Bully hellip 2006

                                          PartitionKey (Category)

                                          RowKey (Title)

                                          Timestamp ReleaseDate

                                          Comedy Office Space hellip 1999

                                          hellip hellip hellip hellip

                                          SciFi X-Men Origins Wolverine hellip 2009

                                          hellip hellip hellip hellip

                                          War Defiance hellip 2008

                                          PartitionKey (Category)

                                          RowKey (Title)

                                          Timestamp ReleaseDate

                                          Action Fast amp Furious hellip 2009

                                          Action The Bourne Ultimatum hellip 2007

                                          hellip hellip hellip hellip

                                          Animation Open Season 2 hellip 2009

                                          Animation The Ant Bully hellip 2006

                                          hellip hellip hellip hellip

                                          Comedy Office Space hellip 1999

                                          hellip hellip hellip hellip

                                          SciFi X-Men Origins Wolverine hellip 2009

                                          hellip hellip hellip hellip

                                          War Defiance hellip 2008

                                          Partitions and Partition Ranges

                                          Key Selection Things to Consider

                                          bullDistribute load as much as possible bullHot partitions can be load balanced bullPartitionKey is critical for scalability

                                          See httpwwwmicrosoftpdccom2009SVC09 and httpazurescopecloudappnet for more information

                                          bull Avoid frequent large scans bull Parallelize queries bull Point queries are most efficient

                                          bullTransactions across a single partition bullTransaction semantics amp Reduce round trips

                                          Scalability

                                          Query Efficiency amp Speed

                                          Entity group transactions

                                          Expect Continuation Tokens ndash Seriously

                                          Maximum of 1000 rows in a response

                                          At the end of partition range boundary

                                          Maximum of 1000 rows in a response

                                          At the end of partition range boundary

                                          Maximum of 5 seconds to execute the query

                                          Tables Recap bullEfficient for frequently used queries

                                          bullSupports batch transactions

                                          bullDistributes load

                                          Select PartitionKey and RowKey that help scale

                                          Avoid ldquoAppend onlyrdquo patterns

                                          Always Handle continuation tokens

                                          ldquoORrdquo predicates are not optimized

                                          Implement back-off strategy for retries

                                          bullDistribute by using a hash etc as prefix

                                          bullExpect continuation tokens for range queries

                                          bullExecute the queries that form the ldquoORrdquo predicates as separate queries

                                          bullServer busy

                                          bullLoad balance partitions to meet traffic needs

                                          bullLoad on single partition has exceeded the limits

                                          WCF Data Services

                                          bullUse a new context for each logical operation

                                          bullAddObjectAttachTo can throw exception if entity is already being tracked

                                          bullPoint query throws an exception if resource does not exist Use IgnoreResourceNotFoundException

                                          Queues Their Unique Role in Building Reliable Scalable Applications

                                          bull Want roles that work closely together but are not bound together bull Tight coupling leads to brittleness

                                          bull This can aid in scaling and performance

                                          bull A queue can hold an unlimited number of messages bull Messages must be serializable as XML

                                          bull Limited to 8KB in size

                                          bull Commonly use the work ticket pattern

                                          bull Why not simply use a table

                                          Queue Terminology

                                          Message Lifecycle

                                          Queue

                                          Msg 1

                                          Msg 2

                                          Msg 3

                                          Msg 4

                                          Worker Role

                                          Worker Role

                                          PutMessage

                                          Web Role

                                          GetMessage (Timeout) RemoveMessage

                                          Msg 2 Msg 1

                                          Worker Role

                                          Msg 2

                                          POST httpmyaccountqueuecorewindowsnetmyqueuemessages

                                          HTTP11 200 OK Transfer-Encoding chunked Content-Type applicationxml Date Tue 09 Dec 2008 210430 GMT Server Nephos Queue Service Version 10 Microsoft-HTTPAPI20

                                          ltxml version=10 encoding=utf-8gt ltQueueMessagesListgt ltQueueMessagegt ltMessageIdgt5974b586-0df3-4e2d-ad0c-18e3892bfca2ltMessageIdgt ltInsertionTimegtMon 22 Sep 2008 232920 GMTltInsertionTimegt ltExpirationTimegtMon 29 Sep 2008 232920 GMTltExpirationTimegt ltPopReceiptgtYzQ4Yzg1MDIGM0MDFiZDAwYzEwltPopReceiptgt ltTimeNextVisiblegtTue 23 Sep 2008 052920GMTltTimeNextVisiblegt ltMessageTextgtPHRlc3Q+dGdGVzdD4=ltMessageTextgt ltQueueMessagegt ltQueueMessagesListgt

                                          DELETE httpmyaccountqueuecorewindowsnetmyqueuemessagesmessageidpopreceipt=YzQ4Yzg1MDIGM0MDFiZDAwYzEw

                                          Truncated Exponential Back Off Polling

                                          Consider a backoff polling approach Each empty poll

                                          increases interval by 2x

                                          A successful sets the interval back to 1

                                          44

                                          2 1

                                          1 1

                                          C1

                                          C2

                                          Removing Poison Messages

                                          1 1

                                          2 1

                                          3 4 0

                                          Producers Consumers

                                          P2

                                          P1

                                          3 0

                                          2 GetMessage(Q 30 s) msg 2

                                          1 GetMessage(Q 30 s) msg 1

                                          1 1

                                          2 1

                                          1 0

                                          2 0

                                          45

                                          C1

                                          C2

                                          Removing Poison Messages

                                          3 4 0

                                          Producers Consumers

                                          P2

                                          P1

                                          1 1

                                          2 1

                                          2 GetMessage(Q 30 s) msg 2 3 C2 consumed msg 2 4 DeleteMessage(Q msg 2) 7 GetMessage(Q 30 s) msg 1

                                          1 GetMessage(Q 30 s) msg 1 5 C1 crashed

                                          1 1

                                          2 1

                                          6 msg1 visible 30 s after Dequeue 3 0

                                          1 2

                                          1 1

                                          1 2

                                          46

                                          C1

                                          C2

                                          Removing Poison Messages

                                          3 4 0

                                          Producers Consumers

                                          P2

                                          P1

                                          1 2

                                          2 Dequeue(Q 30 sec) msg 2 3 C2 consumed msg 2 4 Delete(Q msg 2) 7 Dequeue(Q 30 sec) msg 1 8 C2 crashed

                                          1 Dequeue(Q 30 sec) msg 1 5 C1 crashed 10 C1 restarted 11 Dequeue(Q 30 sec) msg 1 12 DequeueCount gt 2 13 Delete (Q msg1) 1

                                          2

                                          6 msg1 visible 30s after Dequeue 9 msg1 visible 30s after Dequeue

                                          3 0

                                          1 3

                                          1 2

                                          1 3

                                          Queues Recap

                                          bullNo need to deal with failures Make message

                                          processing idempotent

                                          bullInvisible messages result in out of order Do not rely on order

                                          bullEnforce threshold on messagersquos dequeue count Use Dequeue count to remove

                                          poison messages

                                          bullMessages gt 8KB

                                          bullBatch messages

                                          bullGarbage collect orphaned blobs

                                          bullDynamically increasereduce workers

                                          Use blob to store message data with

                                          reference in message

                                          Use message count to scale

                                          bullNo need to deal with failures

                                          bullInvisible messages result in out of order

                                          bullEnforce threshold on messagersquos dequeue count

                                          bullDynamically increasereduce workers

                                          Windows Azure Storage Takeaways

                                          Blobs

                                          Drives

                                          Tables

                                          Queues

                                          httpblogsmsdncomwindowsazurestorage

                                          httpazurescopecloudappnet

                                          49

                                          A Quick Exercise

                                          hellipThen letrsquos look at some code and some tools

                                          50

                                          Code ndash AccountInformationcs public class AccountInformation private static string storageKey = ldquotHiSiSnOtMyKeY private static string accountName = jjstore private static StorageCredentialsAccountAndKey credentials internal static StorageCredentialsAccountAndKey Credentials get if (credentials == null) credentials = new StorageCredentialsAccountAndKey(accountName storageKey) return credentials

                                          51

                                          Code ndash BlobHelpercs public class BlobHelper private static string defaultContainerName = school private CloudBlobClient client = null private CloudBlobContainer container = null private void InitContainer() if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudBlobClient() container = clientGetContainerReference(defaultContainerName) containerCreateIfNotExist() BlobContainerPermissions permissions = containerGetPermissions() permissionsPublicAccess = BlobContainerPublicAccessTypeContainer containerSetPermissions(permissions)

                                          52

                                          Code ndash BlobHelpercs

                                          public void WriteFileToBlob(string filePath) if (client == null || container == null) InitContainer() FileInfo file = new FileInfo(filePath) CloudBlob blob = containerGetBlobReference(fileName) blobPropertiesContentType = GetContentType(fileExtension) blobUploadFile(fileFullName) Or if you want to write a string replace the last line with blobUploadText(someString) And make sure you set the content type to the appropriate MIME type (eg ldquotextplainrdquo)

                                          53

                                          Code ndash BlobHelpercs

                                          public string GetBlobText(string blobName) if (client == null || container == null) InitContainer() CloudBlob blob = containerGetBlobReference(blobName) try return blobDownloadText() catch (Exception) The blob probably does not exist or there is no connection available return null

                                          54

                                          Application Code - Blobs private void SaveToCloudButton_Click(object sender RoutedEventArgs e) StringBuilder buff = new StringBuilder() buffAppendLine(LastNameFirstNameEmailBirthdayNativeLanguageFavoriteIceCreamYearsInPhDGraduated) foreach (AttendeeEntity attendee in attendees) buffAppendLine(attendeeToCsvString()) blobHelperWriteStringToBlob(SummerSchoolAttendeestxt buffToString())

                                          The blob is now available at httpltAccountNamegtblobcorewindowsnetltContainerNamegtltBlobNamegt Or in this case httpjjstoreblobcorewindowsnetschoolSummerSchoolAttendeestxt

                                          55

                                          Code - TableEntities using MicrosoftWindowsAzureStorageClient public class AttendeeEntity TableServiceEntity public string FirstName get set public string LastName get set public string Email get set public DateTime Birthday get set public string FavoriteIceCream get set public int YearsInPhD get set public bool Graduated get set hellip

                                          56

                                          Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

                                          57

                                          Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

                                          58

                                          Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

                                          59

                                          Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

                                          60

                                          Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

                                          Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

                                          61

                                          Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

                                          62

                                          Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

                                          Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

                                          63

                                          Tools ndash Fiddler2

                                          Best Practices

                                          Picking the Right VM Size

                                          bull Having the correct VM size can make a big difference in costs

                                          bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                                          bull If you scale better than linear across cores larger VMs could save you money

                                          bull Pretty rare to see linear scaling across 8 cores

                                          bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                                          bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                                          Using Your VM to the Maximum

                                          Remember bull 1 role instance == 1 VM running Windows

                                          bull 1 role instance = one specific task for your code

                                          bull Yoursquore paying for the entire VM so why not use it

                                          bull Common mistake ndash split up code into multiple roles each not using up CPU

                                          bull Balance between using up CPU vs having free capacity in times of need

                                          bull Multiple ways to use your CPU to the fullest

                                          Exploiting Concurrency

                                          bull Spin up additional processes each with a specific task or as a unit of concurrency

                                          bull May not be ideal if number of active processes exceeds number of cores

                                          bull Use multithreading aggressively

                                          bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                                          bull In NET 4 use the Task Parallel Library

                                          bull Data parallelism

                                          bull Task parallelism

                                          Finding Good Code Neighbors

                                          bull Typically code falls into one or more of these categories

                                          bull Find code that is intensive with different resources to live together

                                          bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                                          Memory Intensive

                                          CPU Intensive

                                          Network IO Intensive

                                          Storage IO Intensive

                                          Scaling Appropriately

                                          bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                                          bull Spinning VMs up and down automatically is good at large scale

                                          bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                                          bull Being too aggressive in spinning down VMs can result in poor user experience

                                          bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                                          Performance Cost

                                          Storage Costs

                                          bull Understand an applicationrsquos storage profile and how storage billing works

                                          bull Make service choices based on your app profile

                                          bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                                          bull Service choice can make a big cost difference based on your app profile

                                          bull Caching and compressing They help a lot with storage costs

                                          Saving Bandwidth Costs

                                          Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                          Sending fewer things over the wire often means getting fewer things from storage

                                          Saving bandwidth costs often lead to savings in other places

                                          Sending fewer things means your VM has time to do other tasks

                                          All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                          Compressing Content

                                          1 Gzip all output content

                                          bull All modern browsers can decompress on the fly

                                          bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                          2Tradeoff compute costs for storage size

                                          3Minimize image sizes

                                          bull Use Portable Network Graphics (PNGs)

                                          bull Crush your PNGs

                                          bull Strip needless metadata

                                          bull Make all PNGs palette PNGs

                                          Uncompressed

                                          Content

                                          Compressed

                                          Content

                                          Gzip

                                          Minify JavaScript

                                          Minify CCS

                                          Minify Images

                                          Best Practices Summary

                                          Doing lsquolessrsquo is the key to saving costs

                                          Measure everything

                                          Know your application profile in and out

                                          Research Examples in the Cloud

                                          hellipon another set of slides

                                          Map Reduce on Azure

                                          bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                          bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                          76

                                          Project Daytona - Map Reduce on Azure

                                          httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                          77

                                          Questions and Discussionhellip

                                          Thank you for hosting me at the Summer School

                                          BLAST (Basic Local Alignment Search Tool)

                                          bull The most important software in bioinformatics

                                          bull Identify similarity between bio-sequences

                                          Computationally intensive

                                          bull Large number of pairwise alignment operations

                                          bull A BLAST running can take 700 ~ 1000 CPU hours

                                          bull Sequence databases growing exponentially

                                          bull GenBank doubled in size in about 15 months

                                          It is easy to parallelize BLAST

                                          bull Segment the input

                                          bull Segment processing (querying) is pleasingly parallel

                                          bull Segment the database (eg mpiBLAST)

                                          bull Needs special result reduction processing

                                          Large volume data

                                          bull A normal Blast database can be as large as 10GB

                                          bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                          bull The output of BLAST is usually 10-100x larger than the input

                                          bull Parallel BLAST engine on Azure

                                          bull Query-segmentation data-parallel pattern bull split the input sequences

                                          bull query partitions in parallel

                                          bull merge results together when done

                                          bull Follows the general suggested application model bull Web Role + Queue + Worker

                                          bull With three special considerations bull Batch job management

                                          bull Task parallelism on an elastic Cloud

                                          Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                          on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                          A simple SplitJoin pattern

                                          Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                          bull 1248 for small middle large and extra large instance size

                                          Task granularity bull Large partition load imbalance

                                          bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                          bull Data transferring overhead

                                          Best Practice test runs to profiling and set size to mitigate the overhead

                                          Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                          bull too small repeated computation

                                          bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                          bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                          bull Watch out for the 2-hour maximum limitation

                                          BLAST task

                                          Splitting task

                                          BLAST task

                                          BLAST task

                                          BLAST task

                                          hellip

                                          Merging Task

                                          Task size vs Performance

                                          bull Benefit of the warm cache effect

                                          bull 100 sequences per partition is the best choice

                                          Instance size vs Performance

                                          bull Super-linear speedup with larger size worker instances

                                          bull Primarily due to the memory capability

                                          Task SizeInstance Size vs Cost

                                          bull Extra-large instance generated the best and the most economical throughput

                                          bull Fully utilize the resource

                                          Web

                                          Portal

                                          Web

                                          Service

                                          Job registration

                                          Job Scheduler

                                          Worker

                                          Worker

                                          Worker

                                          Global

                                          dispatch

                                          queue

                                          Web Role

                                          Azure Table

                                          Job Management Role

                                          Azure Blob

                                          Database

                                          updating Role

                                          hellip

                                          Scaling Engine

                                          Blast databases

                                          temporary data etc)

                                          Job Registry NCBI databases

                                          BLAST task

                                          Splitting task

                                          BLAST task

                                          BLAST task

                                          BLAST task

                                          hellip

                                          Merging Task

                                          ASPNET program hosted by a web role instance bull Submit jobs

                                          bull Track jobrsquos status and logs

                                          AuthenticationAuthorization based on Live ID

                                          The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                          Web Portal

                                          Web Service

                                          Job registration

                                          Job Scheduler

                                          Job Portal

                                          Scaling Engine

                                          Job Registry

                                          R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                          Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                          bull Against ~5000 proteins from another strain completed in less than 30 sec

                                          AzureBLAST significantly saved computing timehellip

                                          Discovering Homologs bull Discover the interrelationships of known protein sequences

                                          ldquoAll against Allrdquo query bull The database is also the input query

                                          bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                          bull Theoretically 100 billion sequence comparisons

                                          Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                          bull Would require 3216731 minutes (61 years) on one desktop

                                          One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                          bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                          bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                          bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                          bull Each segment consists of smaller partitions

                                          bull When load imbalances redistribute the load manually

                                          5

                                          0

                                          62 6

                                          2 6

                                          2 6

                                          2 6

                                          2 5

                                          0 62

                                          bull Total size of the output result is ~230GB

                                          bull The number of total hits is 1764579487

                                          bull Started at March 25th the last task completed on April 8th (10 days compute)

                                          bull But based our estimates real working instance time should be 6~8 day

                                          bull Look into log data to analyze what took placehellip

                                          5

                                          0

                                          62 6

                                          2 6

                                          2 6

                                          2 6

                                          2 5

                                          0 62

                                          A normal log record should be

                                          Otherwise something is wrong (eg task failed to complete)

                                          3312010 614 RD00155D3611B0 Executing the task 251523

                                          3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                          3312010 625 RD00155D3611B0 Executing the task 251553

                                          3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                          3312010 644 RD00155D3611B0 Executing the task 251600

                                          3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                          3312010 822 RD00155D3611B0 Executing the task 251774

                                          3312010 950 RD00155D3611B0 Executing the task 251895

                                          3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                          North Europe Data Center totally 34256 tasks processed

                                          All 62 compute nodes lost tasks

                                          and then came back in a group

                                          This is an Update domain

                                          ~30 mins

                                          ~ 6 nodes in one group

                                          35 Nodes experience blob

                                          writing failure at same

                                          time

                                          West Europe Datacenter 30976 tasks are completed and job was killed

                                          A reasonable guess the

                                          Fault Domain is working

                                          MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                          You never miss the water till the well has run dry Irish Proverb

                                          ET = Water volume evapotranspired (m3 s-1 m-2)

                                          Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                          λv = Latent heat of vaporization (Jg)

                                          Rn = Net radiation (W m-2)

                                          cp = Specific heat capacity of air (J kg-1 K-1)

                                          ρa = dry air density (kg m-3)

                                          δq = vapor pressure deficit (Pa)

                                          ga = Conductivity of air (inverse of ra) (m s-1)

                                          gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                          γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                          Estimating resistanceconductivity across a

                                          catchment can be tricky

                                          bull Lots of inputs big data reduction

                                          bull Some of the inputs are not so simple

                                          119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                          (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                          Penman-Monteith (1964)

                                          Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                          NASA MODIS

                                          imagery source

                                          archives

                                          5 TB (600K files)

                                          FLUXNET curated

                                          sensor dataset

                                          (30GB 960 files)

                                          FLUXNET curated

                                          field dataset

                                          2 KB (1 file)

                                          NCEPNCAR

                                          ~100MB

                                          (4K files)

                                          Vegetative

                                          clumping

                                          ~5MB (1file)

                                          Climate

                                          classification

                                          ~1MB (1file)

                                          20 US year = 1 global year

                                          Data collection (map) stage

                                          bull Downloads requested input tiles

                                          from NASA ftp sites

                                          bull Includes geospatial lookup for

                                          non-sinusoidal tiles that will

                                          contribute to a reprojected

                                          sinusoidal tile

                                          Reprojection (map) stage

                                          bull Converts source tile(s) to

                                          intermediate result sinusoidal tiles

                                          bull Simple nearest neighbor or spline

                                          algorithms

                                          Derivation reduction stage

                                          bull First stage visible to scientist

                                          bull Computes ET in our initial use

                                          Analysis reduction stage

                                          bull Optional second stage visible to

                                          scientist

                                          bull Enables production of science

                                          analysis artifacts such as maps

                                          tables virtual sensors

                                          Reduction 1

                                          Queue

                                          Source

                                          Metadata

                                          AzureMODIS

                                          Service Web Role Portal

                                          Request

                                          Queue

                                          Scientific

                                          Results

                                          Download

                                          Data Collection Stage

                                          Source Imagery Download Sites

                                          Reprojection

                                          Queue

                                          Reduction 2

                                          Queue

                                          Download

                                          Queue

                                          Scientists

                                          Science results

                                          Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                          httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                          bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                          bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                          bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                          recoverable units of work

                                          bull Execution status of all jobs and tasks persisted in Tables

                                          ltPipelineStagegt

                                          Request

                                          hellip ltPipelineStagegtJobStatus

                                          Persist ltPipelineStagegtJob Queue

                                          MODISAzure Service

                                          (Web Role)

                                          Service Monitor

                                          (Worker Role)

                                          Parse amp Persist ltPipelineStagegtTaskStatus

                                          hellip

                                          Dispatch

                                          ltPipelineStagegtTask Queue

                                          All work actually done by a Worker Role

                                          bull Sandboxes science or other executable

                                          bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                          Service Monitor

                                          (Worker Role)

                                          Parse amp Persist ltPipelineStagegtTaskStatus

                                          GenericWorker

                                          (Worker Role)

                                          hellip

                                          hellip

                                          Dispatch

                                          ltPipelineStagegtTask Queue

                                          hellip

                                          ltInputgtData Storage

                                          bull Dequeues tasks created by the Service Monitor

                                          bull Retries failed tasks 3 times

                                          bull Maintains all task status

                                          Reprojection Request

                                          hellip

                                          Service Monitor

                                          (Worker Role)

                                          ReprojectionJobStatus Persist

                                          Parse amp Persist ReprojectionTaskStatus

                                          GenericWorker

                                          (Worker Role)

                                          hellip

                                          Job Queue

                                          hellip

                                          Dispatch

                                          Task Queue

                                          Points to

                                          hellip

                                          ScanTimeList

                                          SwathGranuleMeta

                                          Reprojection Data

                                          Storage

                                          Each entity specifies a

                                          single reprojection job

                                          request

                                          Each entity specifies a

                                          single reprojection task (ie

                                          a single tile)

                                          Query this table to get

                                          geo-metadata (eg

                                          boundaries) for each swath

                                          tile

                                          Query this table to get the

                                          list of satellite scan times

                                          that cover a target tile

                                          Swath Source

                                          Data Storage

                                          bull Computational costs driven by data scale and need to run reduction multiple times

                                          bull Storage costs driven by data scale and 6 month project duration

                                          bull Small with respect to the people costs even at graduate student rates

                                          Reduction 1 Queue

                                          Source Metadata

                                          Request Queue

                                          Scientific Results Download

                                          Data Collection Stage

                                          Source Imagery Download Sites

                                          Reprojection Queue

                                          Reduction 2 Queue

                                          Download

                                          Queue

                                          Scientists

                                          Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                          400-500 GB

                                          60K files

                                          10 MBsec

                                          11 hours

                                          lt10 workers

                                          $50 upload

                                          $450 storage

                                          400 GB

                                          45K files

                                          3500 hours

                                          20-100

                                          workers

                                          5-7 GB

                                          55K files

                                          1800 hours

                                          20-100

                                          workers

                                          lt10 GB

                                          ~1K files

                                          1800 hours

                                          20-100

                                          workers

                                          $420 cpu

                                          $60 download

                                          $216 cpu

                                          $1 download

                                          $6 storage

                                          $216 cpu

                                          $2 download

                                          $9 storage

                                          AzureMODIS

                                          Service Web Role Portal

                                          Total $1420

                                          bull Clouds are the largest scale computer centers ever constructed and have the

                                          potential to be important to both large and small scale science problems

                                          bull Equally import they can increase participation in research providing needed

                                          resources to userscommunities without ready access

                                          bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                          support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                          latency applications do not perform optimally on clouds today

                                          bull Provide valuable fault tolerance and scalability abstractions

                                          bull Clouds as amplifier for familiar client tools and on premise compute

                                          bull Clouds services to support research provide considerable leverage for both

                                          individual researchers and entire communities of researchers

                                          • Day 2 - Azure as a PaaS
                                          • Day 2 - Applications

                                            GUI

                                            Double click on Role Name in Azure Project

                                            Deploying to the cloud

                                            bull We can deploy from the portal or from script

                                            bull VS builds two files

                                            bull Encrypted package of your code

                                            bull Your config file

                                            bull You must create an Azure account then a service and then you deploy your code

                                            bull Can take up to 20 minutes

                                            bull (which is better than six months)

                                            Service Management API

                                            bullREST based API to manage your services

                                            bullX509-certs for authentication

                                            bullLets you create delete change upgrade swaphellip

                                            bullLots of community and MSFT-built tools around the API

                                            - Easy to roll your own

                                            The Secret Sauce ndash The Fabric

                                            The Fabric is the lsquobrainrsquo behind Windows Azure

                                            1 Process service model

                                            1 Determine resource requirements

                                            2 Create role images

                                            2 Allocate resources

                                            3 Prepare nodes

                                            1 Place role images on nodes

                                            2 Configure settings

                                            3 Start roles

                                            4 Configure load balancers

                                            5 Maintain service health

                                            1 If role fails restart the role based on policy

                                            2 If node fails migrate the role based on policy

                                            Storage

                                            Durable Storage At Massive Scale

                                            Blob

                                            - Massive files eg videos logs

                                            Drive

                                            - Use standard file system APIs

                                            Tables - Non-relational but with few scale limits

                                            - Use SQL Azure for relational data

                                            Queues

                                            - Facilitate loosely-coupled reliable systems

                                            Blob Features and Functions

                                            bull Store Large Objects (up to 1TB in size)

                                            bull Can be served through Windows Azure CDN service

                                            bull Standard REST Interface

                                            bull PutBlob

                                            bull Inserts a new blob overwrites the existing blob

                                            bull GetBlob

                                            bull Get whole blob or a specific range

                                            bull DeleteBlob

                                            bull CopyBlob

                                            bull SnapshotBlob

                                            bull LeaseBlob

                                            Two Types of Blobs Under the Hood

                                            bull Block Blob

                                            bull Targeted at streaming workloads

                                            bull Each blob consists of a sequence of blocks bull Each block is identified by a Block ID

                                            bull Size limit 200GB per blob

                                            bull Page Blob

                                            bull Targeted at random readwrite workloads

                                            bull Each blob consists of an array of pages bull Each page is identified by its offset

                                            from the start of the blob

                                            bull Size limit 1TB per blob

                                            Windows Azure Drive

                                            bull Provides a durable NTFS volume for Windows Azure applications to use

                                            bull Use existing NTFS APIs to access a durable drive bull Durability and survival of data on application failover

                                            bull Enables migrating existing NTFS applications to the cloud

                                            bull A Windows Azure Drive is a Page Blob

                                            bull Example mount Page Blob as X bull httpltaccountnamegtblobcorewindowsnetltcontainernamegtltblobnamegt

                                            bull All writes to drive are made durable to the Page Blob bull Drive made durable through standard Page Blob replication

                                            bull Drive persists even when not mounted as a Page Blob

                                            Windows Azure Tables

                                            bull Provides Structured Storage

                                            bull Massively Scalable Tables bull Billions of entities (rows) and TBs of data

                                            bull Can use thousands of servers as traffic grows

                                            bull Highly Available amp Durable bull Data is replicated several times

                                            bull Familiar and Easy to use API

                                            bull WCF Data Services and OData bull NET classes and LINQ

                                            bull REST ndash with any platform or language

                                            Windows Azure Queues

                                            bull Queue are performance efficient highly available and provide reliable message delivery

                                            bull Simple asynchronous work dispatch

                                            bull Programming semantics ensure that a message can be processed at least once

                                            bull Access is provided via REST

                                            Storage Partitioning

                                            Understanding partitioning is key to understanding performance

                                            bull Different for each data type (blobs entities queues) Every data object has a partition key

                                            bull A partition can be served by a single server

                                            bull System load balances partitions based on traffic pattern

                                            bull Controls entity locality Partition key is unit of scale

                                            bull Load balancing can take a few minutes to kick in

                                            bull Can take a couple of seconds for partition to be available on a different server

                                            System load balances

                                            bull Use exponential backoff on ldquoServer Busyrdquo

                                            bull Our system load balances to meet your traffic needs

                                            bull Single partition limits have been reached Server Busy

                                            Partition Keys In Each Abstraction

                                            bull Entities w same PartitionKey value served from same partition Entities ndash TableName + PartitionKey

                                            PartitionKey (CustomerId) RowKey (RowKind)

                                            Name CreditCardNumber OrderTotal

                                            1 Customer-John Smith John Smith xxxx-xxxx-xxxx-xxxx

                                            1 Order ndash 1 $3512

                                            2 Customer-Bill Johnson Bill Johnson xxxx-xxxx-xxxx-xxxx

                                            2 Order ndash 3 $1000

                                            bull Every blob and its snapshots are in a single partition Blobs ndash Container name + Blob name

                                            bullAll messages for a single queue belong to the same partition Messages ndash Queue Name

                                            Container Name Blob Name

                                            image annarborbighousejpg

                                            image foxboroughgillettejpg

                                            video annarborbighousejpg

                                            Queue Message

                                            jobs Message1

                                            jobs Message2

                                            workflow Message1

                                            Scalability Targets

                                            Storage Account

                                            bull Capacity ndash Up to 100 TBs

                                            bull Transactions ndash Up to a few thousand requests per second

                                            bull Bandwidth ndash Up to a few hundred megabytes per second

                                            Single QueueTable Partition

                                            bull Up to 500 transactions per second

                                            To go above these numbers partition between multiple storage accounts and partitions

                                            When limit is hit app will see lsquo503 server busyrsquo applications should implement exponential backoff

                                            Single Blob Partition

                                            bull Throughput up to 60 MBs

                                            PartitionKey (Category)

                                            RowKey (Title)

                                            Timestamp ReleaseDate

                                            Action Fast amp Furious hellip 2009

                                            Action The Bourne Ultimatum hellip 2007

                                            hellip hellip hellip hellip

                                            Animation Open Season 2 hellip 2009

                                            Animation The Ant Bully hellip 2006

                                            PartitionKey (Category)

                                            RowKey (Title)

                                            Timestamp ReleaseDate

                                            Comedy Office Space hellip 1999

                                            hellip hellip hellip hellip

                                            SciFi X-Men Origins Wolverine hellip 2009

                                            hellip hellip hellip hellip

                                            War Defiance hellip 2008

                                            PartitionKey (Category)

                                            RowKey (Title)

                                            Timestamp ReleaseDate

                                            Action Fast amp Furious hellip 2009

                                            Action The Bourne Ultimatum hellip 2007

                                            hellip hellip hellip hellip

                                            Animation Open Season 2 hellip 2009

                                            Animation The Ant Bully hellip 2006

                                            hellip hellip hellip hellip

                                            Comedy Office Space hellip 1999

                                            hellip hellip hellip hellip

                                            SciFi X-Men Origins Wolverine hellip 2009

                                            hellip hellip hellip hellip

                                            War Defiance hellip 2008

                                            Partitions and Partition Ranges

                                            Key Selection Things to Consider

                                            bullDistribute load as much as possible bullHot partitions can be load balanced bullPartitionKey is critical for scalability

                                            See httpwwwmicrosoftpdccom2009SVC09 and httpazurescopecloudappnet for more information

                                            bull Avoid frequent large scans bull Parallelize queries bull Point queries are most efficient

                                            bullTransactions across a single partition bullTransaction semantics amp Reduce round trips

                                            Scalability

                                            Query Efficiency amp Speed

                                            Entity group transactions

                                            Expect Continuation Tokens ndash Seriously

                                            Maximum of 1000 rows in a response

                                            At the end of partition range boundary

                                            Maximum of 1000 rows in a response

                                            At the end of partition range boundary

                                            Maximum of 5 seconds to execute the query

                                            Tables Recap bullEfficient for frequently used queries

                                            bullSupports batch transactions

                                            bullDistributes load

                                            Select PartitionKey and RowKey that help scale

                                            Avoid ldquoAppend onlyrdquo patterns

                                            Always Handle continuation tokens

                                            ldquoORrdquo predicates are not optimized

                                            Implement back-off strategy for retries

                                            bullDistribute by using a hash etc as prefix

                                            bullExpect continuation tokens for range queries

                                            bullExecute the queries that form the ldquoORrdquo predicates as separate queries

                                            bullServer busy

                                            bullLoad balance partitions to meet traffic needs

                                            bullLoad on single partition has exceeded the limits

                                            WCF Data Services

                                            bullUse a new context for each logical operation

                                            bullAddObjectAttachTo can throw exception if entity is already being tracked

                                            bullPoint query throws an exception if resource does not exist Use IgnoreResourceNotFoundException

                                            Queues Their Unique Role in Building Reliable Scalable Applications

                                            bull Want roles that work closely together but are not bound together bull Tight coupling leads to brittleness

                                            bull This can aid in scaling and performance

                                            bull A queue can hold an unlimited number of messages bull Messages must be serializable as XML

                                            bull Limited to 8KB in size

                                            bull Commonly use the work ticket pattern

                                            bull Why not simply use a table

                                            Queue Terminology

                                            Message Lifecycle

                                            Queue

                                            Msg 1

                                            Msg 2

                                            Msg 3

                                            Msg 4

                                            Worker Role

                                            Worker Role

                                            PutMessage

                                            Web Role

                                            GetMessage (Timeout) RemoveMessage

                                            Msg 2 Msg 1

                                            Worker Role

                                            Msg 2

                                            POST httpmyaccountqueuecorewindowsnetmyqueuemessages

                                            HTTP11 200 OK Transfer-Encoding chunked Content-Type applicationxml Date Tue 09 Dec 2008 210430 GMT Server Nephos Queue Service Version 10 Microsoft-HTTPAPI20

                                            ltxml version=10 encoding=utf-8gt ltQueueMessagesListgt ltQueueMessagegt ltMessageIdgt5974b586-0df3-4e2d-ad0c-18e3892bfca2ltMessageIdgt ltInsertionTimegtMon 22 Sep 2008 232920 GMTltInsertionTimegt ltExpirationTimegtMon 29 Sep 2008 232920 GMTltExpirationTimegt ltPopReceiptgtYzQ4Yzg1MDIGM0MDFiZDAwYzEwltPopReceiptgt ltTimeNextVisiblegtTue 23 Sep 2008 052920GMTltTimeNextVisiblegt ltMessageTextgtPHRlc3Q+dGdGVzdD4=ltMessageTextgt ltQueueMessagegt ltQueueMessagesListgt

                                            DELETE httpmyaccountqueuecorewindowsnetmyqueuemessagesmessageidpopreceipt=YzQ4Yzg1MDIGM0MDFiZDAwYzEw

                                            Truncated Exponential Back Off Polling

                                            Consider a backoff polling approach Each empty poll

                                            increases interval by 2x

                                            A successful sets the interval back to 1

                                            44

                                            2 1

                                            1 1

                                            C1

                                            C2

                                            Removing Poison Messages

                                            1 1

                                            2 1

                                            3 4 0

                                            Producers Consumers

                                            P2

                                            P1

                                            3 0

                                            2 GetMessage(Q 30 s) msg 2

                                            1 GetMessage(Q 30 s) msg 1

                                            1 1

                                            2 1

                                            1 0

                                            2 0

                                            45

                                            C1

                                            C2

                                            Removing Poison Messages

                                            3 4 0

                                            Producers Consumers

                                            P2

                                            P1

                                            1 1

                                            2 1

                                            2 GetMessage(Q 30 s) msg 2 3 C2 consumed msg 2 4 DeleteMessage(Q msg 2) 7 GetMessage(Q 30 s) msg 1

                                            1 GetMessage(Q 30 s) msg 1 5 C1 crashed

                                            1 1

                                            2 1

                                            6 msg1 visible 30 s after Dequeue 3 0

                                            1 2

                                            1 1

                                            1 2

                                            46

                                            C1

                                            C2

                                            Removing Poison Messages

                                            3 4 0

                                            Producers Consumers

                                            P2

                                            P1

                                            1 2

                                            2 Dequeue(Q 30 sec) msg 2 3 C2 consumed msg 2 4 Delete(Q msg 2) 7 Dequeue(Q 30 sec) msg 1 8 C2 crashed

                                            1 Dequeue(Q 30 sec) msg 1 5 C1 crashed 10 C1 restarted 11 Dequeue(Q 30 sec) msg 1 12 DequeueCount gt 2 13 Delete (Q msg1) 1

                                            2

                                            6 msg1 visible 30s after Dequeue 9 msg1 visible 30s after Dequeue

                                            3 0

                                            1 3

                                            1 2

                                            1 3

                                            Queues Recap

                                            bullNo need to deal with failures Make message

                                            processing idempotent

                                            bullInvisible messages result in out of order Do not rely on order

                                            bullEnforce threshold on messagersquos dequeue count Use Dequeue count to remove

                                            poison messages

                                            bullMessages gt 8KB

                                            bullBatch messages

                                            bullGarbage collect orphaned blobs

                                            bullDynamically increasereduce workers

                                            Use blob to store message data with

                                            reference in message

                                            Use message count to scale

                                            bullNo need to deal with failures

                                            bullInvisible messages result in out of order

                                            bullEnforce threshold on messagersquos dequeue count

                                            bullDynamically increasereduce workers

                                            Windows Azure Storage Takeaways

                                            Blobs

                                            Drives

                                            Tables

                                            Queues

                                            httpblogsmsdncomwindowsazurestorage

                                            httpazurescopecloudappnet

                                            49

                                            A Quick Exercise

                                            hellipThen letrsquos look at some code and some tools

                                            50

                                            Code ndash AccountInformationcs public class AccountInformation private static string storageKey = ldquotHiSiSnOtMyKeY private static string accountName = jjstore private static StorageCredentialsAccountAndKey credentials internal static StorageCredentialsAccountAndKey Credentials get if (credentials == null) credentials = new StorageCredentialsAccountAndKey(accountName storageKey) return credentials

                                            51

                                            Code ndash BlobHelpercs public class BlobHelper private static string defaultContainerName = school private CloudBlobClient client = null private CloudBlobContainer container = null private void InitContainer() if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudBlobClient() container = clientGetContainerReference(defaultContainerName) containerCreateIfNotExist() BlobContainerPermissions permissions = containerGetPermissions() permissionsPublicAccess = BlobContainerPublicAccessTypeContainer containerSetPermissions(permissions)

                                            52

                                            Code ndash BlobHelpercs

                                            public void WriteFileToBlob(string filePath) if (client == null || container == null) InitContainer() FileInfo file = new FileInfo(filePath) CloudBlob blob = containerGetBlobReference(fileName) blobPropertiesContentType = GetContentType(fileExtension) blobUploadFile(fileFullName) Or if you want to write a string replace the last line with blobUploadText(someString) And make sure you set the content type to the appropriate MIME type (eg ldquotextplainrdquo)

                                            53

                                            Code ndash BlobHelpercs

                                            public string GetBlobText(string blobName) if (client == null || container == null) InitContainer() CloudBlob blob = containerGetBlobReference(blobName) try return blobDownloadText() catch (Exception) The blob probably does not exist or there is no connection available return null

                                            54

                                            Application Code - Blobs private void SaveToCloudButton_Click(object sender RoutedEventArgs e) StringBuilder buff = new StringBuilder() buffAppendLine(LastNameFirstNameEmailBirthdayNativeLanguageFavoriteIceCreamYearsInPhDGraduated) foreach (AttendeeEntity attendee in attendees) buffAppendLine(attendeeToCsvString()) blobHelperWriteStringToBlob(SummerSchoolAttendeestxt buffToString())

                                            The blob is now available at httpltAccountNamegtblobcorewindowsnetltContainerNamegtltBlobNamegt Or in this case httpjjstoreblobcorewindowsnetschoolSummerSchoolAttendeestxt

                                            55

                                            Code - TableEntities using MicrosoftWindowsAzureStorageClient public class AttendeeEntity TableServiceEntity public string FirstName get set public string LastName get set public string Email get set public DateTime Birthday get set public string FavoriteIceCream get set public int YearsInPhD get set public bool Graduated get set hellip

                                            56

                                            Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

                                            57

                                            Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

                                            58

                                            Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

                                            59

                                            Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

                                            60

                                            Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

                                            Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

                                            61

                                            Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

                                            62

                                            Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

                                            Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

                                            63

                                            Tools ndash Fiddler2

                                            Best Practices

                                            Picking the Right VM Size

                                            bull Having the correct VM size can make a big difference in costs

                                            bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                                            bull If you scale better than linear across cores larger VMs could save you money

                                            bull Pretty rare to see linear scaling across 8 cores

                                            bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                                            bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                                            Using Your VM to the Maximum

                                            Remember bull 1 role instance == 1 VM running Windows

                                            bull 1 role instance = one specific task for your code

                                            bull Yoursquore paying for the entire VM so why not use it

                                            bull Common mistake ndash split up code into multiple roles each not using up CPU

                                            bull Balance between using up CPU vs having free capacity in times of need

                                            bull Multiple ways to use your CPU to the fullest

                                            Exploiting Concurrency

                                            bull Spin up additional processes each with a specific task or as a unit of concurrency

                                            bull May not be ideal if number of active processes exceeds number of cores

                                            bull Use multithreading aggressively

                                            bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                                            bull In NET 4 use the Task Parallel Library

                                            bull Data parallelism

                                            bull Task parallelism

                                            Finding Good Code Neighbors

                                            bull Typically code falls into one or more of these categories

                                            bull Find code that is intensive with different resources to live together

                                            bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                                            Memory Intensive

                                            CPU Intensive

                                            Network IO Intensive

                                            Storage IO Intensive

                                            Scaling Appropriately

                                            bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                                            bull Spinning VMs up and down automatically is good at large scale

                                            bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                                            bull Being too aggressive in spinning down VMs can result in poor user experience

                                            bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                                            Performance Cost

                                            Storage Costs

                                            bull Understand an applicationrsquos storage profile and how storage billing works

                                            bull Make service choices based on your app profile

                                            bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                                            bull Service choice can make a big cost difference based on your app profile

                                            bull Caching and compressing They help a lot with storage costs

                                            Saving Bandwidth Costs

                                            Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                            Sending fewer things over the wire often means getting fewer things from storage

                                            Saving bandwidth costs often lead to savings in other places

                                            Sending fewer things means your VM has time to do other tasks

                                            All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                            Compressing Content

                                            1 Gzip all output content

                                            bull All modern browsers can decompress on the fly

                                            bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                            2Tradeoff compute costs for storage size

                                            3Minimize image sizes

                                            bull Use Portable Network Graphics (PNGs)

                                            bull Crush your PNGs

                                            bull Strip needless metadata

                                            bull Make all PNGs palette PNGs

                                            Uncompressed

                                            Content

                                            Compressed

                                            Content

                                            Gzip

                                            Minify JavaScript

                                            Minify CCS

                                            Minify Images

                                            Best Practices Summary

                                            Doing lsquolessrsquo is the key to saving costs

                                            Measure everything

                                            Know your application profile in and out

                                            Research Examples in the Cloud

                                            hellipon another set of slides

                                            Map Reduce on Azure

                                            bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                            bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                            76

                                            Project Daytona - Map Reduce on Azure

                                            httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                            77

                                            Questions and Discussionhellip

                                            Thank you for hosting me at the Summer School

                                            BLAST (Basic Local Alignment Search Tool)

                                            bull The most important software in bioinformatics

                                            bull Identify similarity between bio-sequences

                                            Computationally intensive

                                            bull Large number of pairwise alignment operations

                                            bull A BLAST running can take 700 ~ 1000 CPU hours

                                            bull Sequence databases growing exponentially

                                            bull GenBank doubled in size in about 15 months

                                            It is easy to parallelize BLAST

                                            bull Segment the input

                                            bull Segment processing (querying) is pleasingly parallel

                                            bull Segment the database (eg mpiBLAST)

                                            bull Needs special result reduction processing

                                            Large volume data

                                            bull A normal Blast database can be as large as 10GB

                                            bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                            bull The output of BLAST is usually 10-100x larger than the input

                                            bull Parallel BLAST engine on Azure

                                            bull Query-segmentation data-parallel pattern bull split the input sequences

                                            bull query partitions in parallel

                                            bull merge results together when done

                                            bull Follows the general suggested application model bull Web Role + Queue + Worker

                                            bull With three special considerations bull Batch job management

                                            bull Task parallelism on an elastic Cloud

                                            Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                            on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                            A simple SplitJoin pattern

                                            Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                            bull 1248 for small middle large and extra large instance size

                                            Task granularity bull Large partition load imbalance

                                            bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                            bull Data transferring overhead

                                            Best Practice test runs to profiling and set size to mitigate the overhead

                                            Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                            bull too small repeated computation

                                            bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                            bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                            bull Watch out for the 2-hour maximum limitation

                                            BLAST task

                                            Splitting task

                                            BLAST task

                                            BLAST task

                                            BLAST task

                                            hellip

                                            Merging Task

                                            Task size vs Performance

                                            bull Benefit of the warm cache effect

                                            bull 100 sequences per partition is the best choice

                                            Instance size vs Performance

                                            bull Super-linear speedup with larger size worker instances

                                            bull Primarily due to the memory capability

                                            Task SizeInstance Size vs Cost

                                            bull Extra-large instance generated the best and the most economical throughput

                                            bull Fully utilize the resource

                                            Web

                                            Portal

                                            Web

                                            Service

                                            Job registration

                                            Job Scheduler

                                            Worker

                                            Worker

                                            Worker

                                            Global

                                            dispatch

                                            queue

                                            Web Role

                                            Azure Table

                                            Job Management Role

                                            Azure Blob

                                            Database

                                            updating Role

                                            hellip

                                            Scaling Engine

                                            Blast databases

                                            temporary data etc)

                                            Job Registry NCBI databases

                                            BLAST task

                                            Splitting task

                                            BLAST task

                                            BLAST task

                                            BLAST task

                                            hellip

                                            Merging Task

                                            ASPNET program hosted by a web role instance bull Submit jobs

                                            bull Track jobrsquos status and logs

                                            AuthenticationAuthorization based on Live ID

                                            The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                            Web Portal

                                            Web Service

                                            Job registration

                                            Job Scheduler

                                            Job Portal

                                            Scaling Engine

                                            Job Registry

                                            R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                            Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                            bull Against ~5000 proteins from another strain completed in less than 30 sec

                                            AzureBLAST significantly saved computing timehellip

                                            Discovering Homologs bull Discover the interrelationships of known protein sequences

                                            ldquoAll against Allrdquo query bull The database is also the input query

                                            bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                            bull Theoretically 100 billion sequence comparisons

                                            Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                            bull Would require 3216731 minutes (61 years) on one desktop

                                            One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                            bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                            bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                            bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                            bull Each segment consists of smaller partitions

                                            bull When load imbalances redistribute the load manually

                                            5

                                            0

                                            62 6

                                            2 6

                                            2 6

                                            2 6

                                            2 5

                                            0 62

                                            bull Total size of the output result is ~230GB

                                            bull The number of total hits is 1764579487

                                            bull Started at March 25th the last task completed on April 8th (10 days compute)

                                            bull But based our estimates real working instance time should be 6~8 day

                                            bull Look into log data to analyze what took placehellip

                                            5

                                            0

                                            62 6

                                            2 6

                                            2 6

                                            2 6

                                            2 5

                                            0 62

                                            A normal log record should be

                                            Otherwise something is wrong (eg task failed to complete)

                                            3312010 614 RD00155D3611B0 Executing the task 251523

                                            3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                            3312010 625 RD00155D3611B0 Executing the task 251553

                                            3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                            3312010 644 RD00155D3611B0 Executing the task 251600

                                            3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                            3312010 822 RD00155D3611B0 Executing the task 251774

                                            3312010 950 RD00155D3611B0 Executing the task 251895

                                            3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                            North Europe Data Center totally 34256 tasks processed

                                            All 62 compute nodes lost tasks

                                            and then came back in a group

                                            This is an Update domain

                                            ~30 mins

                                            ~ 6 nodes in one group

                                            35 Nodes experience blob

                                            writing failure at same

                                            time

                                            West Europe Datacenter 30976 tasks are completed and job was killed

                                            A reasonable guess the

                                            Fault Domain is working

                                            MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                            You never miss the water till the well has run dry Irish Proverb

                                            ET = Water volume evapotranspired (m3 s-1 m-2)

                                            Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                            λv = Latent heat of vaporization (Jg)

                                            Rn = Net radiation (W m-2)

                                            cp = Specific heat capacity of air (J kg-1 K-1)

                                            ρa = dry air density (kg m-3)

                                            δq = vapor pressure deficit (Pa)

                                            ga = Conductivity of air (inverse of ra) (m s-1)

                                            gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                            γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                            Estimating resistanceconductivity across a

                                            catchment can be tricky

                                            bull Lots of inputs big data reduction

                                            bull Some of the inputs are not so simple

                                            119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                            (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                            Penman-Monteith (1964)

                                            Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                            NASA MODIS

                                            imagery source

                                            archives

                                            5 TB (600K files)

                                            FLUXNET curated

                                            sensor dataset

                                            (30GB 960 files)

                                            FLUXNET curated

                                            field dataset

                                            2 KB (1 file)

                                            NCEPNCAR

                                            ~100MB

                                            (4K files)

                                            Vegetative

                                            clumping

                                            ~5MB (1file)

                                            Climate

                                            classification

                                            ~1MB (1file)

                                            20 US year = 1 global year

                                            Data collection (map) stage

                                            bull Downloads requested input tiles

                                            from NASA ftp sites

                                            bull Includes geospatial lookup for

                                            non-sinusoidal tiles that will

                                            contribute to a reprojected

                                            sinusoidal tile

                                            Reprojection (map) stage

                                            bull Converts source tile(s) to

                                            intermediate result sinusoidal tiles

                                            bull Simple nearest neighbor or spline

                                            algorithms

                                            Derivation reduction stage

                                            bull First stage visible to scientist

                                            bull Computes ET in our initial use

                                            Analysis reduction stage

                                            bull Optional second stage visible to

                                            scientist

                                            bull Enables production of science

                                            analysis artifacts such as maps

                                            tables virtual sensors

                                            Reduction 1

                                            Queue

                                            Source

                                            Metadata

                                            AzureMODIS

                                            Service Web Role Portal

                                            Request

                                            Queue

                                            Scientific

                                            Results

                                            Download

                                            Data Collection Stage

                                            Source Imagery Download Sites

                                            Reprojection

                                            Queue

                                            Reduction 2

                                            Queue

                                            Download

                                            Queue

                                            Scientists

                                            Science results

                                            Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                            httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                            bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                            bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                            bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                            recoverable units of work

                                            bull Execution status of all jobs and tasks persisted in Tables

                                            ltPipelineStagegt

                                            Request

                                            hellip ltPipelineStagegtJobStatus

                                            Persist ltPipelineStagegtJob Queue

                                            MODISAzure Service

                                            (Web Role)

                                            Service Monitor

                                            (Worker Role)

                                            Parse amp Persist ltPipelineStagegtTaskStatus

                                            hellip

                                            Dispatch

                                            ltPipelineStagegtTask Queue

                                            All work actually done by a Worker Role

                                            bull Sandboxes science or other executable

                                            bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                            Service Monitor

                                            (Worker Role)

                                            Parse amp Persist ltPipelineStagegtTaskStatus

                                            GenericWorker

                                            (Worker Role)

                                            hellip

                                            hellip

                                            Dispatch

                                            ltPipelineStagegtTask Queue

                                            hellip

                                            ltInputgtData Storage

                                            bull Dequeues tasks created by the Service Monitor

                                            bull Retries failed tasks 3 times

                                            bull Maintains all task status

                                            Reprojection Request

                                            hellip

                                            Service Monitor

                                            (Worker Role)

                                            ReprojectionJobStatus Persist

                                            Parse amp Persist ReprojectionTaskStatus

                                            GenericWorker

                                            (Worker Role)

                                            hellip

                                            Job Queue

                                            hellip

                                            Dispatch

                                            Task Queue

                                            Points to

                                            hellip

                                            ScanTimeList

                                            SwathGranuleMeta

                                            Reprojection Data

                                            Storage

                                            Each entity specifies a

                                            single reprojection job

                                            request

                                            Each entity specifies a

                                            single reprojection task (ie

                                            a single tile)

                                            Query this table to get

                                            geo-metadata (eg

                                            boundaries) for each swath

                                            tile

                                            Query this table to get the

                                            list of satellite scan times

                                            that cover a target tile

                                            Swath Source

                                            Data Storage

                                            bull Computational costs driven by data scale and need to run reduction multiple times

                                            bull Storage costs driven by data scale and 6 month project duration

                                            bull Small with respect to the people costs even at graduate student rates

                                            Reduction 1 Queue

                                            Source Metadata

                                            Request Queue

                                            Scientific Results Download

                                            Data Collection Stage

                                            Source Imagery Download Sites

                                            Reprojection Queue

                                            Reduction 2 Queue

                                            Download

                                            Queue

                                            Scientists

                                            Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                            400-500 GB

                                            60K files

                                            10 MBsec

                                            11 hours

                                            lt10 workers

                                            $50 upload

                                            $450 storage

                                            400 GB

                                            45K files

                                            3500 hours

                                            20-100

                                            workers

                                            5-7 GB

                                            55K files

                                            1800 hours

                                            20-100

                                            workers

                                            lt10 GB

                                            ~1K files

                                            1800 hours

                                            20-100

                                            workers

                                            $420 cpu

                                            $60 download

                                            $216 cpu

                                            $1 download

                                            $6 storage

                                            $216 cpu

                                            $2 download

                                            $9 storage

                                            AzureMODIS

                                            Service Web Role Portal

                                            Total $1420

                                            bull Clouds are the largest scale computer centers ever constructed and have the

                                            potential to be important to both large and small scale science problems

                                            bull Equally import they can increase participation in research providing needed

                                            resources to userscommunities without ready access

                                            bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                            support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                            latency applications do not perform optimally on clouds today

                                            bull Provide valuable fault tolerance and scalability abstractions

                                            bull Clouds as amplifier for familiar client tools and on premise compute

                                            bull Clouds services to support research provide considerable leverage for both

                                            individual researchers and entire communities of researchers

                                            • Day 2 - Azure as a PaaS
                                            • Day 2 - Applications

                                              Deploying to the cloud

                                              bull We can deploy from the portal or from script

                                              bull VS builds two files

                                              bull Encrypted package of your code

                                              bull Your config file

                                              bull You must create an Azure account then a service and then you deploy your code

                                              bull Can take up to 20 minutes

                                              bull (which is better than six months)

                                              Service Management API

                                              bullREST based API to manage your services

                                              bullX509-certs for authentication

                                              bullLets you create delete change upgrade swaphellip

                                              bullLots of community and MSFT-built tools around the API

                                              - Easy to roll your own

                                              The Secret Sauce ndash The Fabric

                                              The Fabric is the lsquobrainrsquo behind Windows Azure

                                              1 Process service model

                                              1 Determine resource requirements

                                              2 Create role images

                                              2 Allocate resources

                                              3 Prepare nodes

                                              1 Place role images on nodes

                                              2 Configure settings

                                              3 Start roles

                                              4 Configure load balancers

                                              5 Maintain service health

                                              1 If role fails restart the role based on policy

                                              2 If node fails migrate the role based on policy

                                              Storage

                                              Durable Storage At Massive Scale

                                              Blob

                                              - Massive files eg videos logs

                                              Drive

                                              - Use standard file system APIs

                                              Tables - Non-relational but with few scale limits

                                              - Use SQL Azure for relational data

                                              Queues

                                              - Facilitate loosely-coupled reliable systems

                                              Blob Features and Functions

                                              bull Store Large Objects (up to 1TB in size)

                                              bull Can be served through Windows Azure CDN service

                                              bull Standard REST Interface

                                              bull PutBlob

                                              bull Inserts a new blob overwrites the existing blob

                                              bull GetBlob

                                              bull Get whole blob or a specific range

                                              bull DeleteBlob

                                              bull CopyBlob

                                              bull SnapshotBlob

                                              bull LeaseBlob

                                              Two Types of Blobs Under the Hood

                                              bull Block Blob

                                              bull Targeted at streaming workloads

                                              bull Each blob consists of a sequence of blocks bull Each block is identified by a Block ID

                                              bull Size limit 200GB per blob

                                              bull Page Blob

                                              bull Targeted at random readwrite workloads

                                              bull Each blob consists of an array of pages bull Each page is identified by its offset

                                              from the start of the blob

                                              bull Size limit 1TB per blob

                                              Windows Azure Drive

                                              bull Provides a durable NTFS volume for Windows Azure applications to use

                                              bull Use existing NTFS APIs to access a durable drive bull Durability and survival of data on application failover

                                              bull Enables migrating existing NTFS applications to the cloud

                                              bull A Windows Azure Drive is a Page Blob

                                              bull Example mount Page Blob as X bull httpltaccountnamegtblobcorewindowsnetltcontainernamegtltblobnamegt

                                              bull All writes to drive are made durable to the Page Blob bull Drive made durable through standard Page Blob replication

                                              bull Drive persists even when not mounted as a Page Blob

                                              Windows Azure Tables

                                              bull Provides Structured Storage

                                              bull Massively Scalable Tables bull Billions of entities (rows) and TBs of data

                                              bull Can use thousands of servers as traffic grows

                                              bull Highly Available amp Durable bull Data is replicated several times

                                              bull Familiar and Easy to use API

                                              bull WCF Data Services and OData bull NET classes and LINQ

                                              bull REST ndash with any platform or language

                                              Windows Azure Queues

                                              bull Queue are performance efficient highly available and provide reliable message delivery

                                              bull Simple asynchronous work dispatch

                                              bull Programming semantics ensure that a message can be processed at least once

                                              bull Access is provided via REST

                                              Storage Partitioning

                                              Understanding partitioning is key to understanding performance

                                              bull Different for each data type (blobs entities queues) Every data object has a partition key

                                              bull A partition can be served by a single server

                                              bull System load balances partitions based on traffic pattern

                                              bull Controls entity locality Partition key is unit of scale

                                              bull Load balancing can take a few minutes to kick in

                                              bull Can take a couple of seconds for partition to be available on a different server

                                              System load balances

                                              bull Use exponential backoff on ldquoServer Busyrdquo

                                              bull Our system load balances to meet your traffic needs

                                              bull Single partition limits have been reached Server Busy

                                              Partition Keys In Each Abstraction

                                              bull Entities w same PartitionKey value served from same partition Entities ndash TableName + PartitionKey

                                              PartitionKey (CustomerId) RowKey (RowKind)

                                              Name CreditCardNumber OrderTotal

                                              1 Customer-John Smith John Smith xxxx-xxxx-xxxx-xxxx

                                              1 Order ndash 1 $3512

                                              2 Customer-Bill Johnson Bill Johnson xxxx-xxxx-xxxx-xxxx

                                              2 Order ndash 3 $1000

                                              bull Every blob and its snapshots are in a single partition Blobs ndash Container name + Blob name

                                              bullAll messages for a single queue belong to the same partition Messages ndash Queue Name

                                              Container Name Blob Name

                                              image annarborbighousejpg

                                              image foxboroughgillettejpg

                                              video annarborbighousejpg

                                              Queue Message

                                              jobs Message1

                                              jobs Message2

                                              workflow Message1

                                              Scalability Targets

                                              Storage Account

                                              bull Capacity ndash Up to 100 TBs

                                              bull Transactions ndash Up to a few thousand requests per second

                                              bull Bandwidth ndash Up to a few hundred megabytes per second

                                              Single QueueTable Partition

                                              bull Up to 500 transactions per second

                                              To go above these numbers partition between multiple storage accounts and partitions

                                              When limit is hit app will see lsquo503 server busyrsquo applications should implement exponential backoff

                                              Single Blob Partition

                                              bull Throughput up to 60 MBs

                                              PartitionKey (Category)

                                              RowKey (Title)

                                              Timestamp ReleaseDate

                                              Action Fast amp Furious hellip 2009

                                              Action The Bourne Ultimatum hellip 2007

                                              hellip hellip hellip hellip

                                              Animation Open Season 2 hellip 2009

                                              Animation The Ant Bully hellip 2006

                                              PartitionKey (Category)

                                              RowKey (Title)

                                              Timestamp ReleaseDate

                                              Comedy Office Space hellip 1999

                                              hellip hellip hellip hellip

                                              SciFi X-Men Origins Wolverine hellip 2009

                                              hellip hellip hellip hellip

                                              War Defiance hellip 2008

                                              PartitionKey (Category)

                                              RowKey (Title)

                                              Timestamp ReleaseDate

                                              Action Fast amp Furious hellip 2009

                                              Action The Bourne Ultimatum hellip 2007

                                              hellip hellip hellip hellip

                                              Animation Open Season 2 hellip 2009

                                              Animation The Ant Bully hellip 2006

                                              hellip hellip hellip hellip

                                              Comedy Office Space hellip 1999

                                              hellip hellip hellip hellip

                                              SciFi X-Men Origins Wolverine hellip 2009

                                              hellip hellip hellip hellip

                                              War Defiance hellip 2008

                                              Partitions and Partition Ranges

                                              Key Selection Things to Consider

                                              bullDistribute load as much as possible bullHot partitions can be load balanced bullPartitionKey is critical for scalability

                                              See httpwwwmicrosoftpdccom2009SVC09 and httpazurescopecloudappnet for more information

                                              bull Avoid frequent large scans bull Parallelize queries bull Point queries are most efficient

                                              bullTransactions across a single partition bullTransaction semantics amp Reduce round trips

                                              Scalability

                                              Query Efficiency amp Speed

                                              Entity group transactions

                                              Expect Continuation Tokens ndash Seriously

                                              Maximum of 1000 rows in a response

                                              At the end of partition range boundary

                                              Maximum of 1000 rows in a response

                                              At the end of partition range boundary

                                              Maximum of 5 seconds to execute the query

                                              Tables Recap bullEfficient for frequently used queries

                                              bullSupports batch transactions

                                              bullDistributes load

                                              Select PartitionKey and RowKey that help scale

                                              Avoid ldquoAppend onlyrdquo patterns

                                              Always Handle continuation tokens

                                              ldquoORrdquo predicates are not optimized

                                              Implement back-off strategy for retries

                                              bullDistribute by using a hash etc as prefix

                                              bullExpect continuation tokens for range queries

                                              bullExecute the queries that form the ldquoORrdquo predicates as separate queries

                                              bullServer busy

                                              bullLoad balance partitions to meet traffic needs

                                              bullLoad on single partition has exceeded the limits

                                              WCF Data Services

                                              bullUse a new context for each logical operation

                                              bullAddObjectAttachTo can throw exception if entity is already being tracked

                                              bullPoint query throws an exception if resource does not exist Use IgnoreResourceNotFoundException

                                              Queues Their Unique Role in Building Reliable Scalable Applications

                                              bull Want roles that work closely together but are not bound together bull Tight coupling leads to brittleness

                                              bull This can aid in scaling and performance

                                              bull A queue can hold an unlimited number of messages bull Messages must be serializable as XML

                                              bull Limited to 8KB in size

                                              bull Commonly use the work ticket pattern

                                              bull Why not simply use a table

                                              Queue Terminology

                                              Message Lifecycle

                                              Queue

                                              Msg 1

                                              Msg 2

                                              Msg 3

                                              Msg 4

                                              Worker Role

                                              Worker Role

                                              PutMessage

                                              Web Role

                                              GetMessage (Timeout) RemoveMessage

                                              Msg 2 Msg 1

                                              Worker Role

                                              Msg 2

                                              POST httpmyaccountqueuecorewindowsnetmyqueuemessages

                                              HTTP11 200 OK Transfer-Encoding chunked Content-Type applicationxml Date Tue 09 Dec 2008 210430 GMT Server Nephos Queue Service Version 10 Microsoft-HTTPAPI20

                                              ltxml version=10 encoding=utf-8gt ltQueueMessagesListgt ltQueueMessagegt ltMessageIdgt5974b586-0df3-4e2d-ad0c-18e3892bfca2ltMessageIdgt ltInsertionTimegtMon 22 Sep 2008 232920 GMTltInsertionTimegt ltExpirationTimegtMon 29 Sep 2008 232920 GMTltExpirationTimegt ltPopReceiptgtYzQ4Yzg1MDIGM0MDFiZDAwYzEwltPopReceiptgt ltTimeNextVisiblegtTue 23 Sep 2008 052920GMTltTimeNextVisiblegt ltMessageTextgtPHRlc3Q+dGdGVzdD4=ltMessageTextgt ltQueueMessagegt ltQueueMessagesListgt

                                              DELETE httpmyaccountqueuecorewindowsnetmyqueuemessagesmessageidpopreceipt=YzQ4Yzg1MDIGM0MDFiZDAwYzEw

                                              Truncated Exponential Back Off Polling

                                              Consider a backoff polling approach Each empty poll

                                              increases interval by 2x

                                              A successful sets the interval back to 1

                                              44

                                              2 1

                                              1 1

                                              C1

                                              C2

                                              Removing Poison Messages

                                              1 1

                                              2 1

                                              3 4 0

                                              Producers Consumers

                                              P2

                                              P1

                                              3 0

                                              2 GetMessage(Q 30 s) msg 2

                                              1 GetMessage(Q 30 s) msg 1

                                              1 1

                                              2 1

                                              1 0

                                              2 0

                                              45

                                              C1

                                              C2

                                              Removing Poison Messages

                                              3 4 0

                                              Producers Consumers

                                              P2

                                              P1

                                              1 1

                                              2 1

                                              2 GetMessage(Q 30 s) msg 2 3 C2 consumed msg 2 4 DeleteMessage(Q msg 2) 7 GetMessage(Q 30 s) msg 1

                                              1 GetMessage(Q 30 s) msg 1 5 C1 crashed

                                              1 1

                                              2 1

                                              6 msg1 visible 30 s after Dequeue 3 0

                                              1 2

                                              1 1

                                              1 2

                                              46

                                              C1

                                              C2

                                              Removing Poison Messages

                                              3 4 0

                                              Producers Consumers

                                              P2

                                              P1

                                              1 2

                                              2 Dequeue(Q 30 sec) msg 2 3 C2 consumed msg 2 4 Delete(Q msg 2) 7 Dequeue(Q 30 sec) msg 1 8 C2 crashed

                                              1 Dequeue(Q 30 sec) msg 1 5 C1 crashed 10 C1 restarted 11 Dequeue(Q 30 sec) msg 1 12 DequeueCount gt 2 13 Delete (Q msg1) 1

                                              2

                                              6 msg1 visible 30s after Dequeue 9 msg1 visible 30s after Dequeue

                                              3 0

                                              1 3

                                              1 2

                                              1 3

                                              Queues Recap

                                              bullNo need to deal with failures Make message

                                              processing idempotent

                                              bullInvisible messages result in out of order Do not rely on order

                                              bullEnforce threshold on messagersquos dequeue count Use Dequeue count to remove

                                              poison messages

                                              bullMessages gt 8KB

                                              bullBatch messages

                                              bullGarbage collect orphaned blobs

                                              bullDynamically increasereduce workers

                                              Use blob to store message data with

                                              reference in message

                                              Use message count to scale

                                              bullNo need to deal with failures

                                              bullInvisible messages result in out of order

                                              bullEnforce threshold on messagersquos dequeue count

                                              bullDynamically increasereduce workers

                                              Windows Azure Storage Takeaways

                                              Blobs

                                              Drives

                                              Tables

                                              Queues

                                              httpblogsmsdncomwindowsazurestorage

                                              httpazurescopecloudappnet

                                              49

                                              A Quick Exercise

                                              hellipThen letrsquos look at some code and some tools

                                              50

                                              Code ndash AccountInformationcs public class AccountInformation private static string storageKey = ldquotHiSiSnOtMyKeY private static string accountName = jjstore private static StorageCredentialsAccountAndKey credentials internal static StorageCredentialsAccountAndKey Credentials get if (credentials == null) credentials = new StorageCredentialsAccountAndKey(accountName storageKey) return credentials

                                              51

                                              Code ndash BlobHelpercs public class BlobHelper private static string defaultContainerName = school private CloudBlobClient client = null private CloudBlobContainer container = null private void InitContainer() if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudBlobClient() container = clientGetContainerReference(defaultContainerName) containerCreateIfNotExist() BlobContainerPermissions permissions = containerGetPermissions() permissionsPublicAccess = BlobContainerPublicAccessTypeContainer containerSetPermissions(permissions)

                                              52

                                              Code ndash BlobHelpercs

                                              public void WriteFileToBlob(string filePath) if (client == null || container == null) InitContainer() FileInfo file = new FileInfo(filePath) CloudBlob blob = containerGetBlobReference(fileName) blobPropertiesContentType = GetContentType(fileExtension) blobUploadFile(fileFullName) Or if you want to write a string replace the last line with blobUploadText(someString) And make sure you set the content type to the appropriate MIME type (eg ldquotextplainrdquo)

                                              53

                                              Code ndash BlobHelpercs

                                              public string GetBlobText(string blobName) if (client == null || container == null) InitContainer() CloudBlob blob = containerGetBlobReference(blobName) try return blobDownloadText() catch (Exception) The blob probably does not exist or there is no connection available return null

                                              54

                                              Application Code - Blobs private void SaveToCloudButton_Click(object sender RoutedEventArgs e) StringBuilder buff = new StringBuilder() buffAppendLine(LastNameFirstNameEmailBirthdayNativeLanguageFavoriteIceCreamYearsInPhDGraduated) foreach (AttendeeEntity attendee in attendees) buffAppendLine(attendeeToCsvString()) blobHelperWriteStringToBlob(SummerSchoolAttendeestxt buffToString())

                                              The blob is now available at httpltAccountNamegtblobcorewindowsnetltContainerNamegtltBlobNamegt Or in this case httpjjstoreblobcorewindowsnetschoolSummerSchoolAttendeestxt

                                              55

                                              Code - TableEntities using MicrosoftWindowsAzureStorageClient public class AttendeeEntity TableServiceEntity public string FirstName get set public string LastName get set public string Email get set public DateTime Birthday get set public string FavoriteIceCream get set public int YearsInPhD get set public bool Graduated get set hellip

                                              56

                                              Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

                                              57

                                              Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

                                              58

                                              Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

                                              59

                                              Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

                                              60

                                              Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

                                              Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

                                              61

                                              Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

                                              62

                                              Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

                                              Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

                                              63

                                              Tools ndash Fiddler2

                                              Best Practices

                                              Picking the Right VM Size

                                              bull Having the correct VM size can make a big difference in costs

                                              bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                                              bull If you scale better than linear across cores larger VMs could save you money

                                              bull Pretty rare to see linear scaling across 8 cores

                                              bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                                              bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                                              Using Your VM to the Maximum

                                              Remember bull 1 role instance == 1 VM running Windows

                                              bull 1 role instance = one specific task for your code

                                              bull Yoursquore paying for the entire VM so why not use it

                                              bull Common mistake ndash split up code into multiple roles each not using up CPU

                                              bull Balance between using up CPU vs having free capacity in times of need

                                              bull Multiple ways to use your CPU to the fullest

                                              Exploiting Concurrency

                                              bull Spin up additional processes each with a specific task or as a unit of concurrency

                                              bull May not be ideal if number of active processes exceeds number of cores

                                              bull Use multithreading aggressively

                                              bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                                              bull In NET 4 use the Task Parallel Library

                                              bull Data parallelism

                                              bull Task parallelism

                                              Finding Good Code Neighbors

                                              bull Typically code falls into one or more of these categories

                                              bull Find code that is intensive with different resources to live together

                                              bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                                              Memory Intensive

                                              CPU Intensive

                                              Network IO Intensive

                                              Storage IO Intensive

                                              Scaling Appropriately

                                              bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                                              bull Spinning VMs up and down automatically is good at large scale

                                              bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                                              bull Being too aggressive in spinning down VMs can result in poor user experience

                                              bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                                              Performance Cost

                                              Storage Costs

                                              bull Understand an applicationrsquos storage profile and how storage billing works

                                              bull Make service choices based on your app profile

                                              bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                                              bull Service choice can make a big cost difference based on your app profile

                                              bull Caching and compressing They help a lot with storage costs

                                              Saving Bandwidth Costs

                                              Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                              Sending fewer things over the wire often means getting fewer things from storage

                                              Saving bandwidth costs often lead to savings in other places

                                              Sending fewer things means your VM has time to do other tasks

                                              All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                              Compressing Content

                                              1 Gzip all output content

                                              bull All modern browsers can decompress on the fly

                                              bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                              2Tradeoff compute costs for storage size

                                              3Minimize image sizes

                                              bull Use Portable Network Graphics (PNGs)

                                              bull Crush your PNGs

                                              bull Strip needless metadata

                                              bull Make all PNGs palette PNGs

                                              Uncompressed

                                              Content

                                              Compressed

                                              Content

                                              Gzip

                                              Minify JavaScript

                                              Minify CCS

                                              Minify Images

                                              Best Practices Summary

                                              Doing lsquolessrsquo is the key to saving costs

                                              Measure everything

                                              Know your application profile in and out

                                              Research Examples in the Cloud

                                              hellipon another set of slides

                                              Map Reduce on Azure

                                              bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                              bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                              76

                                              Project Daytona - Map Reduce on Azure

                                              httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                              77

                                              Questions and Discussionhellip

                                              Thank you for hosting me at the Summer School

                                              BLAST (Basic Local Alignment Search Tool)

                                              bull The most important software in bioinformatics

                                              bull Identify similarity between bio-sequences

                                              Computationally intensive

                                              bull Large number of pairwise alignment operations

                                              bull A BLAST running can take 700 ~ 1000 CPU hours

                                              bull Sequence databases growing exponentially

                                              bull GenBank doubled in size in about 15 months

                                              It is easy to parallelize BLAST

                                              bull Segment the input

                                              bull Segment processing (querying) is pleasingly parallel

                                              bull Segment the database (eg mpiBLAST)

                                              bull Needs special result reduction processing

                                              Large volume data

                                              bull A normal Blast database can be as large as 10GB

                                              bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                              bull The output of BLAST is usually 10-100x larger than the input

                                              bull Parallel BLAST engine on Azure

                                              bull Query-segmentation data-parallel pattern bull split the input sequences

                                              bull query partitions in parallel

                                              bull merge results together when done

                                              bull Follows the general suggested application model bull Web Role + Queue + Worker

                                              bull With three special considerations bull Batch job management

                                              bull Task parallelism on an elastic Cloud

                                              Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                              on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                              A simple SplitJoin pattern

                                              Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                              bull 1248 for small middle large and extra large instance size

                                              Task granularity bull Large partition load imbalance

                                              bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                              bull Data transferring overhead

                                              Best Practice test runs to profiling and set size to mitigate the overhead

                                              Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                              bull too small repeated computation

                                              bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                              bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                              bull Watch out for the 2-hour maximum limitation

                                              BLAST task

                                              Splitting task

                                              BLAST task

                                              BLAST task

                                              BLAST task

                                              hellip

                                              Merging Task

                                              Task size vs Performance

                                              bull Benefit of the warm cache effect

                                              bull 100 sequences per partition is the best choice

                                              Instance size vs Performance

                                              bull Super-linear speedup with larger size worker instances

                                              bull Primarily due to the memory capability

                                              Task SizeInstance Size vs Cost

                                              bull Extra-large instance generated the best and the most economical throughput

                                              bull Fully utilize the resource

                                              Web

                                              Portal

                                              Web

                                              Service

                                              Job registration

                                              Job Scheduler

                                              Worker

                                              Worker

                                              Worker

                                              Global

                                              dispatch

                                              queue

                                              Web Role

                                              Azure Table

                                              Job Management Role

                                              Azure Blob

                                              Database

                                              updating Role

                                              hellip

                                              Scaling Engine

                                              Blast databases

                                              temporary data etc)

                                              Job Registry NCBI databases

                                              BLAST task

                                              Splitting task

                                              BLAST task

                                              BLAST task

                                              BLAST task

                                              hellip

                                              Merging Task

                                              ASPNET program hosted by a web role instance bull Submit jobs

                                              bull Track jobrsquos status and logs

                                              AuthenticationAuthorization based on Live ID

                                              The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                              Web Portal

                                              Web Service

                                              Job registration

                                              Job Scheduler

                                              Job Portal

                                              Scaling Engine

                                              Job Registry

                                              R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                              Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                              bull Against ~5000 proteins from another strain completed in less than 30 sec

                                              AzureBLAST significantly saved computing timehellip

                                              Discovering Homologs bull Discover the interrelationships of known protein sequences

                                              ldquoAll against Allrdquo query bull The database is also the input query

                                              bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                              bull Theoretically 100 billion sequence comparisons

                                              Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                              bull Would require 3216731 minutes (61 years) on one desktop

                                              One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                              bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                              bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                              bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                              bull Each segment consists of smaller partitions

                                              bull When load imbalances redistribute the load manually

                                              5

                                              0

                                              62 6

                                              2 6

                                              2 6

                                              2 6

                                              2 5

                                              0 62

                                              bull Total size of the output result is ~230GB

                                              bull The number of total hits is 1764579487

                                              bull Started at March 25th the last task completed on April 8th (10 days compute)

                                              bull But based our estimates real working instance time should be 6~8 day

                                              bull Look into log data to analyze what took placehellip

                                              5

                                              0

                                              62 6

                                              2 6

                                              2 6

                                              2 6

                                              2 5

                                              0 62

                                              A normal log record should be

                                              Otherwise something is wrong (eg task failed to complete)

                                              3312010 614 RD00155D3611B0 Executing the task 251523

                                              3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                              3312010 625 RD00155D3611B0 Executing the task 251553

                                              3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                              3312010 644 RD00155D3611B0 Executing the task 251600

                                              3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                              3312010 822 RD00155D3611B0 Executing the task 251774

                                              3312010 950 RD00155D3611B0 Executing the task 251895

                                              3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                              North Europe Data Center totally 34256 tasks processed

                                              All 62 compute nodes lost tasks

                                              and then came back in a group

                                              This is an Update domain

                                              ~30 mins

                                              ~ 6 nodes in one group

                                              35 Nodes experience blob

                                              writing failure at same

                                              time

                                              West Europe Datacenter 30976 tasks are completed and job was killed

                                              A reasonable guess the

                                              Fault Domain is working

                                              MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                              You never miss the water till the well has run dry Irish Proverb

                                              ET = Water volume evapotranspired (m3 s-1 m-2)

                                              Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                              λv = Latent heat of vaporization (Jg)

                                              Rn = Net radiation (W m-2)

                                              cp = Specific heat capacity of air (J kg-1 K-1)

                                              ρa = dry air density (kg m-3)

                                              δq = vapor pressure deficit (Pa)

                                              ga = Conductivity of air (inverse of ra) (m s-1)

                                              gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                              γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                              Estimating resistanceconductivity across a

                                              catchment can be tricky

                                              bull Lots of inputs big data reduction

                                              bull Some of the inputs are not so simple

                                              119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                              (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                              Penman-Monteith (1964)

                                              Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                              NASA MODIS

                                              imagery source

                                              archives

                                              5 TB (600K files)

                                              FLUXNET curated

                                              sensor dataset

                                              (30GB 960 files)

                                              FLUXNET curated

                                              field dataset

                                              2 KB (1 file)

                                              NCEPNCAR

                                              ~100MB

                                              (4K files)

                                              Vegetative

                                              clumping

                                              ~5MB (1file)

                                              Climate

                                              classification

                                              ~1MB (1file)

                                              20 US year = 1 global year

                                              Data collection (map) stage

                                              bull Downloads requested input tiles

                                              from NASA ftp sites

                                              bull Includes geospatial lookup for

                                              non-sinusoidal tiles that will

                                              contribute to a reprojected

                                              sinusoidal tile

                                              Reprojection (map) stage

                                              bull Converts source tile(s) to

                                              intermediate result sinusoidal tiles

                                              bull Simple nearest neighbor or spline

                                              algorithms

                                              Derivation reduction stage

                                              bull First stage visible to scientist

                                              bull Computes ET in our initial use

                                              Analysis reduction stage

                                              bull Optional second stage visible to

                                              scientist

                                              bull Enables production of science

                                              analysis artifacts such as maps

                                              tables virtual sensors

                                              Reduction 1

                                              Queue

                                              Source

                                              Metadata

                                              AzureMODIS

                                              Service Web Role Portal

                                              Request

                                              Queue

                                              Scientific

                                              Results

                                              Download

                                              Data Collection Stage

                                              Source Imagery Download Sites

                                              Reprojection

                                              Queue

                                              Reduction 2

                                              Queue

                                              Download

                                              Queue

                                              Scientists

                                              Science results

                                              Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                              httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                              bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                              bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                              bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                              recoverable units of work

                                              bull Execution status of all jobs and tasks persisted in Tables

                                              ltPipelineStagegt

                                              Request

                                              hellip ltPipelineStagegtJobStatus

                                              Persist ltPipelineStagegtJob Queue

                                              MODISAzure Service

                                              (Web Role)

                                              Service Monitor

                                              (Worker Role)

                                              Parse amp Persist ltPipelineStagegtTaskStatus

                                              hellip

                                              Dispatch

                                              ltPipelineStagegtTask Queue

                                              All work actually done by a Worker Role

                                              bull Sandboxes science or other executable

                                              bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                              Service Monitor

                                              (Worker Role)

                                              Parse amp Persist ltPipelineStagegtTaskStatus

                                              GenericWorker

                                              (Worker Role)

                                              hellip

                                              hellip

                                              Dispatch

                                              ltPipelineStagegtTask Queue

                                              hellip

                                              ltInputgtData Storage

                                              bull Dequeues tasks created by the Service Monitor

                                              bull Retries failed tasks 3 times

                                              bull Maintains all task status

                                              Reprojection Request

                                              hellip

                                              Service Monitor

                                              (Worker Role)

                                              ReprojectionJobStatus Persist

                                              Parse amp Persist ReprojectionTaskStatus

                                              GenericWorker

                                              (Worker Role)

                                              hellip

                                              Job Queue

                                              hellip

                                              Dispatch

                                              Task Queue

                                              Points to

                                              hellip

                                              ScanTimeList

                                              SwathGranuleMeta

                                              Reprojection Data

                                              Storage

                                              Each entity specifies a

                                              single reprojection job

                                              request

                                              Each entity specifies a

                                              single reprojection task (ie

                                              a single tile)

                                              Query this table to get

                                              geo-metadata (eg

                                              boundaries) for each swath

                                              tile

                                              Query this table to get the

                                              list of satellite scan times

                                              that cover a target tile

                                              Swath Source

                                              Data Storage

                                              bull Computational costs driven by data scale and need to run reduction multiple times

                                              bull Storage costs driven by data scale and 6 month project duration

                                              bull Small with respect to the people costs even at graduate student rates

                                              Reduction 1 Queue

                                              Source Metadata

                                              Request Queue

                                              Scientific Results Download

                                              Data Collection Stage

                                              Source Imagery Download Sites

                                              Reprojection Queue

                                              Reduction 2 Queue

                                              Download

                                              Queue

                                              Scientists

                                              Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                              400-500 GB

                                              60K files

                                              10 MBsec

                                              11 hours

                                              lt10 workers

                                              $50 upload

                                              $450 storage

                                              400 GB

                                              45K files

                                              3500 hours

                                              20-100

                                              workers

                                              5-7 GB

                                              55K files

                                              1800 hours

                                              20-100

                                              workers

                                              lt10 GB

                                              ~1K files

                                              1800 hours

                                              20-100

                                              workers

                                              $420 cpu

                                              $60 download

                                              $216 cpu

                                              $1 download

                                              $6 storage

                                              $216 cpu

                                              $2 download

                                              $9 storage

                                              AzureMODIS

                                              Service Web Role Portal

                                              Total $1420

                                              bull Clouds are the largest scale computer centers ever constructed and have the

                                              potential to be important to both large and small scale science problems

                                              bull Equally import they can increase participation in research providing needed

                                              resources to userscommunities without ready access

                                              bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                              support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                              latency applications do not perform optimally on clouds today

                                              bull Provide valuable fault tolerance and scalability abstractions

                                              bull Clouds as amplifier for familiar client tools and on premise compute

                                              bull Clouds services to support research provide considerable leverage for both

                                              individual researchers and entire communities of researchers

                                              • Day 2 - Azure as a PaaS
                                              • Day 2 - Applications

                                                Service Management API

                                                bullREST based API to manage your services

                                                bullX509-certs for authentication

                                                bullLets you create delete change upgrade swaphellip

                                                bullLots of community and MSFT-built tools around the API

                                                - Easy to roll your own

                                                The Secret Sauce ndash The Fabric

                                                The Fabric is the lsquobrainrsquo behind Windows Azure

                                                1 Process service model

                                                1 Determine resource requirements

                                                2 Create role images

                                                2 Allocate resources

                                                3 Prepare nodes

                                                1 Place role images on nodes

                                                2 Configure settings

                                                3 Start roles

                                                4 Configure load balancers

                                                5 Maintain service health

                                                1 If role fails restart the role based on policy

                                                2 If node fails migrate the role based on policy

                                                Storage

                                                Durable Storage At Massive Scale

                                                Blob

                                                - Massive files eg videos logs

                                                Drive

                                                - Use standard file system APIs

                                                Tables - Non-relational but with few scale limits

                                                - Use SQL Azure for relational data

                                                Queues

                                                - Facilitate loosely-coupled reliable systems

                                                Blob Features and Functions

                                                bull Store Large Objects (up to 1TB in size)

                                                bull Can be served through Windows Azure CDN service

                                                bull Standard REST Interface

                                                bull PutBlob

                                                bull Inserts a new blob overwrites the existing blob

                                                bull GetBlob

                                                bull Get whole blob or a specific range

                                                bull DeleteBlob

                                                bull CopyBlob

                                                bull SnapshotBlob

                                                bull LeaseBlob

                                                Two Types of Blobs Under the Hood

                                                bull Block Blob

                                                bull Targeted at streaming workloads

                                                bull Each blob consists of a sequence of blocks bull Each block is identified by a Block ID

                                                bull Size limit 200GB per blob

                                                bull Page Blob

                                                bull Targeted at random readwrite workloads

                                                bull Each blob consists of an array of pages bull Each page is identified by its offset

                                                from the start of the blob

                                                bull Size limit 1TB per blob

                                                Windows Azure Drive

                                                bull Provides a durable NTFS volume for Windows Azure applications to use

                                                bull Use existing NTFS APIs to access a durable drive bull Durability and survival of data on application failover

                                                bull Enables migrating existing NTFS applications to the cloud

                                                bull A Windows Azure Drive is a Page Blob

                                                bull Example mount Page Blob as X bull httpltaccountnamegtblobcorewindowsnetltcontainernamegtltblobnamegt

                                                bull All writes to drive are made durable to the Page Blob bull Drive made durable through standard Page Blob replication

                                                bull Drive persists even when not mounted as a Page Blob

                                                Windows Azure Tables

                                                bull Provides Structured Storage

                                                bull Massively Scalable Tables bull Billions of entities (rows) and TBs of data

                                                bull Can use thousands of servers as traffic grows

                                                bull Highly Available amp Durable bull Data is replicated several times

                                                bull Familiar and Easy to use API

                                                bull WCF Data Services and OData bull NET classes and LINQ

                                                bull REST ndash with any platform or language

                                                Windows Azure Queues

                                                bull Queue are performance efficient highly available and provide reliable message delivery

                                                bull Simple asynchronous work dispatch

                                                bull Programming semantics ensure that a message can be processed at least once

                                                bull Access is provided via REST

                                                Storage Partitioning

                                                Understanding partitioning is key to understanding performance

                                                bull Different for each data type (blobs entities queues) Every data object has a partition key

                                                bull A partition can be served by a single server

                                                bull System load balances partitions based on traffic pattern

                                                bull Controls entity locality Partition key is unit of scale

                                                bull Load balancing can take a few minutes to kick in

                                                bull Can take a couple of seconds for partition to be available on a different server

                                                System load balances

                                                bull Use exponential backoff on ldquoServer Busyrdquo

                                                bull Our system load balances to meet your traffic needs

                                                bull Single partition limits have been reached Server Busy

                                                Partition Keys In Each Abstraction

                                                bull Entities w same PartitionKey value served from same partition Entities ndash TableName + PartitionKey

                                                PartitionKey (CustomerId) RowKey (RowKind)

                                                Name CreditCardNumber OrderTotal

                                                1 Customer-John Smith John Smith xxxx-xxxx-xxxx-xxxx

                                                1 Order ndash 1 $3512

                                                2 Customer-Bill Johnson Bill Johnson xxxx-xxxx-xxxx-xxxx

                                                2 Order ndash 3 $1000

                                                bull Every blob and its snapshots are in a single partition Blobs ndash Container name + Blob name

                                                bullAll messages for a single queue belong to the same partition Messages ndash Queue Name

                                                Container Name Blob Name

                                                image annarborbighousejpg

                                                image foxboroughgillettejpg

                                                video annarborbighousejpg

                                                Queue Message

                                                jobs Message1

                                                jobs Message2

                                                workflow Message1

                                                Scalability Targets

                                                Storage Account

                                                bull Capacity ndash Up to 100 TBs

                                                bull Transactions ndash Up to a few thousand requests per second

                                                bull Bandwidth ndash Up to a few hundred megabytes per second

                                                Single QueueTable Partition

                                                bull Up to 500 transactions per second

                                                To go above these numbers partition between multiple storage accounts and partitions

                                                When limit is hit app will see lsquo503 server busyrsquo applications should implement exponential backoff

                                                Single Blob Partition

                                                bull Throughput up to 60 MBs

                                                PartitionKey (Category)

                                                RowKey (Title)

                                                Timestamp ReleaseDate

                                                Action Fast amp Furious hellip 2009

                                                Action The Bourne Ultimatum hellip 2007

                                                hellip hellip hellip hellip

                                                Animation Open Season 2 hellip 2009

                                                Animation The Ant Bully hellip 2006

                                                PartitionKey (Category)

                                                RowKey (Title)

                                                Timestamp ReleaseDate

                                                Comedy Office Space hellip 1999

                                                hellip hellip hellip hellip

                                                SciFi X-Men Origins Wolverine hellip 2009

                                                hellip hellip hellip hellip

                                                War Defiance hellip 2008

                                                PartitionKey (Category)

                                                RowKey (Title)

                                                Timestamp ReleaseDate

                                                Action Fast amp Furious hellip 2009

                                                Action The Bourne Ultimatum hellip 2007

                                                hellip hellip hellip hellip

                                                Animation Open Season 2 hellip 2009

                                                Animation The Ant Bully hellip 2006

                                                hellip hellip hellip hellip

                                                Comedy Office Space hellip 1999

                                                hellip hellip hellip hellip

                                                SciFi X-Men Origins Wolverine hellip 2009

                                                hellip hellip hellip hellip

                                                War Defiance hellip 2008

                                                Partitions and Partition Ranges

                                                Key Selection Things to Consider

                                                bullDistribute load as much as possible bullHot partitions can be load balanced bullPartitionKey is critical for scalability

                                                See httpwwwmicrosoftpdccom2009SVC09 and httpazurescopecloudappnet for more information

                                                bull Avoid frequent large scans bull Parallelize queries bull Point queries are most efficient

                                                bullTransactions across a single partition bullTransaction semantics amp Reduce round trips

                                                Scalability

                                                Query Efficiency amp Speed

                                                Entity group transactions

                                                Expect Continuation Tokens ndash Seriously

                                                Maximum of 1000 rows in a response

                                                At the end of partition range boundary

                                                Maximum of 1000 rows in a response

                                                At the end of partition range boundary

                                                Maximum of 5 seconds to execute the query

                                                Tables Recap bullEfficient for frequently used queries

                                                bullSupports batch transactions

                                                bullDistributes load

                                                Select PartitionKey and RowKey that help scale

                                                Avoid ldquoAppend onlyrdquo patterns

                                                Always Handle continuation tokens

                                                ldquoORrdquo predicates are not optimized

                                                Implement back-off strategy for retries

                                                bullDistribute by using a hash etc as prefix

                                                bullExpect continuation tokens for range queries

                                                bullExecute the queries that form the ldquoORrdquo predicates as separate queries

                                                bullServer busy

                                                bullLoad balance partitions to meet traffic needs

                                                bullLoad on single partition has exceeded the limits

                                                WCF Data Services

                                                bullUse a new context for each logical operation

                                                bullAddObjectAttachTo can throw exception if entity is already being tracked

                                                bullPoint query throws an exception if resource does not exist Use IgnoreResourceNotFoundException

                                                Queues Their Unique Role in Building Reliable Scalable Applications

                                                bull Want roles that work closely together but are not bound together bull Tight coupling leads to brittleness

                                                bull This can aid in scaling and performance

                                                bull A queue can hold an unlimited number of messages bull Messages must be serializable as XML

                                                bull Limited to 8KB in size

                                                bull Commonly use the work ticket pattern

                                                bull Why not simply use a table

                                                Queue Terminology

                                                Message Lifecycle

                                                Queue

                                                Msg 1

                                                Msg 2

                                                Msg 3

                                                Msg 4

                                                Worker Role

                                                Worker Role

                                                PutMessage

                                                Web Role

                                                GetMessage (Timeout) RemoveMessage

                                                Msg 2 Msg 1

                                                Worker Role

                                                Msg 2

                                                POST httpmyaccountqueuecorewindowsnetmyqueuemessages

                                                HTTP11 200 OK Transfer-Encoding chunked Content-Type applicationxml Date Tue 09 Dec 2008 210430 GMT Server Nephos Queue Service Version 10 Microsoft-HTTPAPI20

                                                ltxml version=10 encoding=utf-8gt ltQueueMessagesListgt ltQueueMessagegt ltMessageIdgt5974b586-0df3-4e2d-ad0c-18e3892bfca2ltMessageIdgt ltInsertionTimegtMon 22 Sep 2008 232920 GMTltInsertionTimegt ltExpirationTimegtMon 29 Sep 2008 232920 GMTltExpirationTimegt ltPopReceiptgtYzQ4Yzg1MDIGM0MDFiZDAwYzEwltPopReceiptgt ltTimeNextVisiblegtTue 23 Sep 2008 052920GMTltTimeNextVisiblegt ltMessageTextgtPHRlc3Q+dGdGVzdD4=ltMessageTextgt ltQueueMessagegt ltQueueMessagesListgt

                                                DELETE httpmyaccountqueuecorewindowsnetmyqueuemessagesmessageidpopreceipt=YzQ4Yzg1MDIGM0MDFiZDAwYzEw

                                                Truncated Exponential Back Off Polling

                                                Consider a backoff polling approach Each empty poll

                                                increases interval by 2x

                                                A successful sets the interval back to 1

                                                44

                                                2 1

                                                1 1

                                                C1

                                                C2

                                                Removing Poison Messages

                                                1 1

                                                2 1

                                                3 4 0

                                                Producers Consumers

                                                P2

                                                P1

                                                3 0

                                                2 GetMessage(Q 30 s) msg 2

                                                1 GetMessage(Q 30 s) msg 1

                                                1 1

                                                2 1

                                                1 0

                                                2 0

                                                45

                                                C1

                                                C2

                                                Removing Poison Messages

                                                3 4 0

                                                Producers Consumers

                                                P2

                                                P1

                                                1 1

                                                2 1

                                                2 GetMessage(Q 30 s) msg 2 3 C2 consumed msg 2 4 DeleteMessage(Q msg 2) 7 GetMessage(Q 30 s) msg 1

                                                1 GetMessage(Q 30 s) msg 1 5 C1 crashed

                                                1 1

                                                2 1

                                                6 msg1 visible 30 s after Dequeue 3 0

                                                1 2

                                                1 1

                                                1 2

                                                46

                                                C1

                                                C2

                                                Removing Poison Messages

                                                3 4 0

                                                Producers Consumers

                                                P2

                                                P1

                                                1 2

                                                2 Dequeue(Q 30 sec) msg 2 3 C2 consumed msg 2 4 Delete(Q msg 2) 7 Dequeue(Q 30 sec) msg 1 8 C2 crashed

                                                1 Dequeue(Q 30 sec) msg 1 5 C1 crashed 10 C1 restarted 11 Dequeue(Q 30 sec) msg 1 12 DequeueCount gt 2 13 Delete (Q msg1) 1

                                                2

                                                6 msg1 visible 30s after Dequeue 9 msg1 visible 30s after Dequeue

                                                3 0

                                                1 3

                                                1 2

                                                1 3

                                                Queues Recap

                                                bullNo need to deal with failures Make message

                                                processing idempotent

                                                bullInvisible messages result in out of order Do not rely on order

                                                bullEnforce threshold on messagersquos dequeue count Use Dequeue count to remove

                                                poison messages

                                                bullMessages gt 8KB

                                                bullBatch messages

                                                bullGarbage collect orphaned blobs

                                                bullDynamically increasereduce workers

                                                Use blob to store message data with

                                                reference in message

                                                Use message count to scale

                                                bullNo need to deal with failures

                                                bullInvisible messages result in out of order

                                                bullEnforce threshold on messagersquos dequeue count

                                                bullDynamically increasereduce workers

                                                Windows Azure Storage Takeaways

                                                Blobs

                                                Drives

                                                Tables

                                                Queues

                                                httpblogsmsdncomwindowsazurestorage

                                                httpazurescopecloudappnet

                                                49

                                                A Quick Exercise

                                                hellipThen letrsquos look at some code and some tools

                                                50

                                                Code ndash AccountInformationcs public class AccountInformation private static string storageKey = ldquotHiSiSnOtMyKeY private static string accountName = jjstore private static StorageCredentialsAccountAndKey credentials internal static StorageCredentialsAccountAndKey Credentials get if (credentials == null) credentials = new StorageCredentialsAccountAndKey(accountName storageKey) return credentials

                                                51

                                                Code ndash BlobHelpercs public class BlobHelper private static string defaultContainerName = school private CloudBlobClient client = null private CloudBlobContainer container = null private void InitContainer() if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudBlobClient() container = clientGetContainerReference(defaultContainerName) containerCreateIfNotExist() BlobContainerPermissions permissions = containerGetPermissions() permissionsPublicAccess = BlobContainerPublicAccessTypeContainer containerSetPermissions(permissions)

                                                52

                                                Code ndash BlobHelpercs

                                                public void WriteFileToBlob(string filePath) if (client == null || container == null) InitContainer() FileInfo file = new FileInfo(filePath) CloudBlob blob = containerGetBlobReference(fileName) blobPropertiesContentType = GetContentType(fileExtension) blobUploadFile(fileFullName) Or if you want to write a string replace the last line with blobUploadText(someString) And make sure you set the content type to the appropriate MIME type (eg ldquotextplainrdquo)

                                                53

                                                Code ndash BlobHelpercs

                                                public string GetBlobText(string blobName) if (client == null || container == null) InitContainer() CloudBlob blob = containerGetBlobReference(blobName) try return blobDownloadText() catch (Exception) The blob probably does not exist or there is no connection available return null

                                                54

                                                Application Code - Blobs private void SaveToCloudButton_Click(object sender RoutedEventArgs e) StringBuilder buff = new StringBuilder() buffAppendLine(LastNameFirstNameEmailBirthdayNativeLanguageFavoriteIceCreamYearsInPhDGraduated) foreach (AttendeeEntity attendee in attendees) buffAppendLine(attendeeToCsvString()) blobHelperWriteStringToBlob(SummerSchoolAttendeestxt buffToString())

                                                The blob is now available at httpltAccountNamegtblobcorewindowsnetltContainerNamegtltBlobNamegt Or in this case httpjjstoreblobcorewindowsnetschoolSummerSchoolAttendeestxt

                                                55

                                                Code - TableEntities using MicrosoftWindowsAzureStorageClient public class AttendeeEntity TableServiceEntity public string FirstName get set public string LastName get set public string Email get set public DateTime Birthday get set public string FavoriteIceCream get set public int YearsInPhD get set public bool Graduated get set hellip

                                                56

                                                Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

                                                57

                                                Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

                                                58

                                                Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

                                                59

                                                Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

                                                60

                                                Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

                                                Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

                                                61

                                                Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

                                                62

                                                Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

                                                Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

                                                63

                                                Tools ndash Fiddler2

                                                Best Practices

                                                Picking the Right VM Size

                                                bull Having the correct VM size can make a big difference in costs

                                                bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                                                bull If you scale better than linear across cores larger VMs could save you money

                                                bull Pretty rare to see linear scaling across 8 cores

                                                bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                                                bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                                                Using Your VM to the Maximum

                                                Remember bull 1 role instance == 1 VM running Windows

                                                bull 1 role instance = one specific task for your code

                                                bull Yoursquore paying for the entire VM so why not use it

                                                bull Common mistake ndash split up code into multiple roles each not using up CPU

                                                bull Balance between using up CPU vs having free capacity in times of need

                                                bull Multiple ways to use your CPU to the fullest

                                                Exploiting Concurrency

                                                bull Spin up additional processes each with a specific task or as a unit of concurrency

                                                bull May not be ideal if number of active processes exceeds number of cores

                                                bull Use multithreading aggressively

                                                bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                                                bull In NET 4 use the Task Parallel Library

                                                bull Data parallelism

                                                bull Task parallelism

                                                Finding Good Code Neighbors

                                                bull Typically code falls into one or more of these categories

                                                bull Find code that is intensive with different resources to live together

                                                bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                                                Memory Intensive

                                                CPU Intensive

                                                Network IO Intensive

                                                Storage IO Intensive

                                                Scaling Appropriately

                                                bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                                                bull Spinning VMs up and down automatically is good at large scale

                                                bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                                                bull Being too aggressive in spinning down VMs can result in poor user experience

                                                bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                                                Performance Cost

                                                Storage Costs

                                                bull Understand an applicationrsquos storage profile and how storage billing works

                                                bull Make service choices based on your app profile

                                                bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                                                bull Service choice can make a big cost difference based on your app profile

                                                bull Caching and compressing They help a lot with storage costs

                                                Saving Bandwidth Costs

                                                Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                                Sending fewer things over the wire often means getting fewer things from storage

                                                Saving bandwidth costs often lead to savings in other places

                                                Sending fewer things means your VM has time to do other tasks

                                                All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                                Compressing Content

                                                1 Gzip all output content

                                                bull All modern browsers can decompress on the fly

                                                bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                                2Tradeoff compute costs for storage size

                                                3Minimize image sizes

                                                bull Use Portable Network Graphics (PNGs)

                                                bull Crush your PNGs

                                                bull Strip needless metadata

                                                bull Make all PNGs palette PNGs

                                                Uncompressed

                                                Content

                                                Compressed

                                                Content

                                                Gzip

                                                Minify JavaScript

                                                Minify CCS

                                                Minify Images

                                                Best Practices Summary

                                                Doing lsquolessrsquo is the key to saving costs

                                                Measure everything

                                                Know your application profile in and out

                                                Research Examples in the Cloud

                                                hellipon another set of slides

                                                Map Reduce on Azure

                                                bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                                bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                                76

                                                Project Daytona - Map Reduce on Azure

                                                httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                                77

                                                Questions and Discussionhellip

                                                Thank you for hosting me at the Summer School

                                                BLAST (Basic Local Alignment Search Tool)

                                                bull The most important software in bioinformatics

                                                bull Identify similarity between bio-sequences

                                                Computationally intensive

                                                bull Large number of pairwise alignment operations

                                                bull A BLAST running can take 700 ~ 1000 CPU hours

                                                bull Sequence databases growing exponentially

                                                bull GenBank doubled in size in about 15 months

                                                It is easy to parallelize BLAST

                                                bull Segment the input

                                                bull Segment processing (querying) is pleasingly parallel

                                                bull Segment the database (eg mpiBLAST)

                                                bull Needs special result reduction processing

                                                Large volume data

                                                bull A normal Blast database can be as large as 10GB

                                                bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                bull The output of BLAST is usually 10-100x larger than the input

                                                bull Parallel BLAST engine on Azure

                                                bull Query-segmentation data-parallel pattern bull split the input sequences

                                                bull query partitions in parallel

                                                bull merge results together when done

                                                bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                bull With three special considerations bull Batch job management

                                                bull Task parallelism on an elastic Cloud

                                                Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                A simple SplitJoin pattern

                                                Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                bull 1248 for small middle large and extra large instance size

                                                Task granularity bull Large partition load imbalance

                                                bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                bull Data transferring overhead

                                                Best Practice test runs to profiling and set size to mitigate the overhead

                                                Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                bull too small repeated computation

                                                bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                bull Watch out for the 2-hour maximum limitation

                                                BLAST task

                                                Splitting task

                                                BLAST task

                                                BLAST task

                                                BLAST task

                                                hellip

                                                Merging Task

                                                Task size vs Performance

                                                bull Benefit of the warm cache effect

                                                bull 100 sequences per partition is the best choice

                                                Instance size vs Performance

                                                bull Super-linear speedup with larger size worker instances

                                                bull Primarily due to the memory capability

                                                Task SizeInstance Size vs Cost

                                                bull Extra-large instance generated the best and the most economical throughput

                                                bull Fully utilize the resource

                                                Web

                                                Portal

                                                Web

                                                Service

                                                Job registration

                                                Job Scheduler

                                                Worker

                                                Worker

                                                Worker

                                                Global

                                                dispatch

                                                queue

                                                Web Role

                                                Azure Table

                                                Job Management Role

                                                Azure Blob

                                                Database

                                                updating Role

                                                hellip

                                                Scaling Engine

                                                Blast databases

                                                temporary data etc)

                                                Job Registry NCBI databases

                                                BLAST task

                                                Splitting task

                                                BLAST task

                                                BLAST task

                                                BLAST task

                                                hellip

                                                Merging Task

                                                ASPNET program hosted by a web role instance bull Submit jobs

                                                bull Track jobrsquos status and logs

                                                AuthenticationAuthorization based on Live ID

                                                The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                Web Portal

                                                Web Service

                                                Job registration

                                                Job Scheduler

                                                Job Portal

                                                Scaling Engine

                                                Job Registry

                                                R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                AzureBLAST significantly saved computing timehellip

                                                Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                ldquoAll against Allrdquo query bull The database is also the input query

                                                bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                bull Theoretically 100 billion sequence comparisons

                                                Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                bull Would require 3216731 minutes (61 years) on one desktop

                                                One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                bull Each segment consists of smaller partitions

                                                bull When load imbalances redistribute the load manually

                                                5

                                                0

                                                62 6

                                                2 6

                                                2 6

                                                2 6

                                                2 5

                                                0 62

                                                bull Total size of the output result is ~230GB

                                                bull The number of total hits is 1764579487

                                                bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                bull But based our estimates real working instance time should be 6~8 day

                                                bull Look into log data to analyze what took placehellip

                                                5

                                                0

                                                62 6

                                                2 6

                                                2 6

                                                2 6

                                                2 5

                                                0 62

                                                A normal log record should be

                                                Otherwise something is wrong (eg task failed to complete)

                                                3312010 614 RD00155D3611B0 Executing the task 251523

                                                3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                3312010 625 RD00155D3611B0 Executing the task 251553

                                                3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                3312010 644 RD00155D3611B0 Executing the task 251600

                                                3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                3312010 822 RD00155D3611B0 Executing the task 251774

                                                3312010 950 RD00155D3611B0 Executing the task 251895

                                                3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                North Europe Data Center totally 34256 tasks processed

                                                All 62 compute nodes lost tasks

                                                and then came back in a group

                                                This is an Update domain

                                                ~30 mins

                                                ~ 6 nodes in one group

                                                35 Nodes experience blob

                                                writing failure at same

                                                time

                                                West Europe Datacenter 30976 tasks are completed and job was killed

                                                A reasonable guess the

                                                Fault Domain is working

                                                MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                You never miss the water till the well has run dry Irish Proverb

                                                ET = Water volume evapotranspired (m3 s-1 m-2)

                                                Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                λv = Latent heat of vaporization (Jg)

                                                Rn = Net radiation (W m-2)

                                                cp = Specific heat capacity of air (J kg-1 K-1)

                                                ρa = dry air density (kg m-3)

                                                δq = vapor pressure deficit (Pa)

                                                ga = Conductivity of air (inverse of ra) (m s-1)

                                                gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                Estimating resistanceconductivity across a

                                                catchment can be tricky

                                                bull Lots of inputs big data reduction

                                                bull Some of the inputs are not so simple

                                                119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                Penman-Monteith (1964)

                                                Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                NASA MODIS

                                                imagery source

                                                archives

                                                5 TB (600K files)

                                                FLUXNET curated

                                                sensor dataset

                                                (30GB 960 files)

                                                FLUXNET curated

                                                field dataset

                                                2 KB (1 file)

                                                NCEPNCAR

                                                ~100MB

                                                (4K files)

                                                Vegetative

                                                clumping

                                                ~5MB (1file)

                                                Climate

                                                classification

                                                ~1MB (1file)

                                                20 US year = 1 global year

                                                Data collection (map) stage

                                                bull Downloads requested input tiles

                                                from NASA ftp sites

                                                bull Includes geospatial lookup for

                                                non-sinusoidal tiles that will

                                                contribute to a reprojected

                                                sinusoidal tile

                                                Reprojection (map) stage

                                                bull Converts source tile(s) to

                                                intermediate result sinusoidal tiles

                                                bull Simple nearest neighbor or spline

                                                algorithms

                                                Derivation reduction stage

                                                bull First stage visible to scientist

                                                bull Computes ET in our initial use

                                                Analysis reduction stage

                                                bull Optional second stage visible to

                                                scientist

                                                bull Enables production of science

                                                analysis artifacts such as maps

                                                tables virtual sensors

                                                Reduction 1

                                                Queue

                                                Source

                                                Metadata

                                                AzureMODIS

                                                Service Web Role Portal

                                                Request

                                                Queue

                                                Scientific

                                                Results

                                                Download

                                                Data Collection Stage

                                                Source Imagery Download Sites

                                                Reprojection

                                                Queue

                                                Reduction 2

                                                Queue

                                                Download

                                                Queue

                                                Scientists

                                                Science results

                                                Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                recoverable units of work

                                                bull Execution status of all jobs and tasks persisted in Tables

                                                ltPipelineStagegt

                                                Request

                                                hellip ltPipelineStagegtJobStatus

                                                Persist ltPipelineStagegtJob Queue

                                                MODISAzure Service

                                                (Web Role)

                                                Service Monitor

                                                (Worker Role)

                                                Parse amp Persist ltPipelineStagegtTaskStatus

                                                hellip

                                                Dispatch

                                                ltPipelineStagegtTask Queue

                                                All work actually done by a Worker Role

                                                bull Sandboxes science or other executable

                                                bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                Service Monitor

                                                (Worker Role)

                                                Parse amp Persist ltPipelineStagegtTaskStatus

                                                GenericWorker

                                                (Worker Role)

                                                hellip

                                                hellip

                                                Dispatch

                                                ltPipelineStagegtTask Queue

                                                hellip

                                                ltInputgtData Storage

                                                bull Dequeues tasks created by the Service Monitor

                                                bull Retries failed tasks 3 times

                                                bull Maintains all task status

                                                Reprojection Request

                                                hellip

                                                Service Monitor

                                                (Worker Role)

                                                ReprojectionJobStatus Persist

                                                Parse amp Persist ReprojectionTaskStatus

                                                GenericWorker

                                                (Worker Role)

                                                hellip

                                                Job Queue

                                                hellip

                                                Dispatch

                                                Task Queue

                                                Points to

                                                hellip

                                                ScanTimeList

                                                SwathGranuleMeta

                                                Reprojection Data

                                                Storage

                                                Each entity specifies a

                                                single reprojection job

                                                request

                                                Each entity specifies a

                                                single reprojection task (ie

                                                a single tile)

                                                Query this table to get

                                                geo-metadata (eg

                                                boundaries) for each swath

                                                tile

                                                Query this table to get the

                                                list of satellite scan times

                                                that cover a target tile

                                                Swath Source

                                                Data Storage

                                                bull Computational costs driven by data scale and need to run reduction multiple times

                                                bull Storage costs driven by data scale and 6 month project duration

                                                bull Small with respect to the people costs even at graduate student rates

                                                Reduction 1 Queue

                                                Source Metadata

                                                Request Queue

                                                Scientific Results Download

                                                Data Collection Stage

                                                Source Imagery Download Sites

                                                Reprojection Queue

                                                Reduction 2 Queue

                                                Download

                                                Queue

                                                Scientists

                                                Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                400-500 GB

                                                60K files

                                                10 MBsec

                                                11 hours

                                                lt10 workers

                                                $50 upload

                                                $450 storage

                                                400 GB

                                                45K files

                                                3500 hours

                                                20-100

                                                workers

                                                5-7 GB

                                                55K files

                                                1800 hours

                                                20-100

                                                workers

                                                lt10 GB

                                                ~1K files

                                                1800 hours

                                                20-100

                                                workers

                                                $420 cpu

                                                $60 download

                                                $216 cpu

                                                $1 download

                                                $6 storage

                                                $216 cpu

                                                $2 download

                                                $9 storage

                                                AzureMODIS

                                                Service Web Role Portal

                                                Total $1420

                                                bull Clouds are the largest scale computer centers ever constructed and have the

                                                potential to be important to both large and small scale science problems

                                                bull Equally import they can increase participation in research providing needed

                                                resources to userscommunities without ready access

                                                bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                latency applications do not perform optimally on clouds today

                                                bull Provide valuable fault tolerance and scalability abstractions

                                                bull Clouds as amplifier for familiar client tools and on premise compute

                                                bull Clouds services to support research provide considerable leverage for both

                                                individual researchers and entire communities of researchers

                                                • Day 2 - Azure as a PaaS
                                                • Day 2 - Applications

                                                  The Secret Sauce ndash The Fabric

                                                  The Fabric is the lsquobrainrsquo behind Windows Azure

                                                  1 Process service model

                                                  1 Determine resource requirements

                                                  2 Create role images

                                                  2 Allocate resources

                                                  3 Prepare nodes

                                                  1 Place role images on nodes

                                                  2 Configure settings

                                                  3 Start roles

                                                  4 Configure load balancers

                                                  5 Maintain service health

                                                  1 If role fails restart the role based on policy

                                                  2 If node fails migrate the role based on policy

                                                  Storage

                                                  Durable Storage At Massive Scale

                                                  Blob

                                                  - Massive files eg videos logs

                                                  Drive

                                                  - Use standard file system APIs

                                                  Tables - Non-relational but with few scale limits

                                                  - Use SQL Azure for relational data

                                                  Queues

                                                  - Facilitate loosely-coupled reliable systems

                                                  Blob Features and Functions

                                                  bull Store Large Objects (up to 1TB in size)

                                                  bull Can be served through Windows Azure CDN service

                                                  bull Standard REST Interface

                                                  bull PutBlob

                                                  bull Inserts a new blob overwrites the existing blob

                                                  bull GetBlob

                                                  bull Get whole blob or a specific range

                                                  bull DeleteBlob

                                                  bull CopyBlob

                                                  bull SnapshotBlob

                                                  bull LeaseBlob

                                                  Two Types of Blobs Under the Hood

                                                  bull Block Blob

                                                  bull Targeted at streaming workloads

                                                  bull Each blob consists of a sequence of blocks bull Each block is identified by a Block ID

                                                  bull Size limit 200GB per blob

                                                  bull Page Blob

                                                  bull Targeted at random readwrite workloads

                                                  bull Each blob consists of an array of pages bull Each page is identified by its offset

                                                  from the start of the blob

                                                  bull Size limit 1TB per blob

                                                  Windows Azure Drive

                                                  bull Provides a durable NTFS volume for Windows Azure applications to use

                                                  bull Use existing NTFS APIs to access a durable drive bull Durability and survival of data on application failover

                                                  bull Enables migrating existing NTFS applications to the cloud

                                                  bull A Windows Azure Drive is a Page Blob

                                                  bull Example mount Page Blob as X bull httpltaccountnamegtblobcorewindowsnetltcontainernamegtltblobnamegt

                                                  bull All writes to drive are made durable to the Page Blob bull Drive made durable through standard Page Blob replication

                                                  bull Drive persists even when not mounted as a Page Blob

                                                  Windows Azure Tables

                                                  bull Provides Structured Storage

                                                  bull Massively Scalable Tables bull Billions of entities (rows) and TBs of data

                                                  bull Can use thousands of servers as traffic grows

                                                  bull Highly Available amp Durable bull Data is replicated several times

                                                  bull Familiar and Easy to use API

                                                  bull WCF Data Services and OData bull NET classes and LINQ

                                                  bull REST ndash with any platform or language

                                                  Windows Azure Queues

                                                  bull Queue are performance efficient highly available and provide reliable message delivery

                                                  bull Simple asynchronous work dispatch

                                                  bull Programming semantics ensure that a message can be processed at least once

                                                  bull Access is provided via REST

                                                  Storage Partitioning

                                                  Understanding partitioning is key to understanding performance

                                                  bull Different for each data type (blobs entities queues) Every data object has a partition key

                                                  bull A partition can be served by a single server

                                                  bull System load balances partitions based on traffic pattern

                                                  bull Controls entity locality Partition key is unit of scale

                                                  bull Load balancing can take a few minutes to kick in

                                                  bull Can take a couple of seconds for partition to be available on a different server

                                                  System load balances

                                                  bull Use exponential backoff on ldquoServer Busyrdquo

                                                  bull Our system load balances to meet your traffic needs

                                                  bull Single partition limits have been reached Server Busy

                                                  Partition Keys In Each Abstraction

                                                  bull Entities w same PartitionKey value served from same partition Entities ndash TableName + PartitionKey

                                                  PartitionKey (CustomerId) RowKey (RowKind)

                                                  Name CreditCardNumber OrderTotal

                                                  1 Customer-John Smith John Smith xxxx-xxxx-xxxx-xxxx

                                                  1 Order ndash 1 $3512

                                                  2 Customer-Bill Johnson Bill Johnson xxxx-xxxx-xxxx-xxxx

                                                  2 Order ndash 3 $1000

                                                  bull Every blob and its snapshots are in a single partition Blobs ndash Container name + Blob name

                                                  bullAll messages for a single queue belong to the same partition Messages ndash Queue Name

                                                  Container Name Blob Name

                                                  image annarborbighousejpg

                                                  image foxboroughgillettejpg

                                                  video annarborbighousejpg

                                                  Queue Message

                                                  jobs Message1

                                                  jobs Message2

                                                  workflow Message1

                                                  Scalability Targets

                                                  Storage Account

                                                  bull Capacity ndash Up to 100 TBs

                                                  bull Transactions ndash Up to a few thousand requests per second

                                                  bull Bandwidth ndash Up to a few hundred megabytes per second

                                                  Single QueueTable Partition

                                                  bull Up to 500 transactions per second

                                                  To go above these numbers partition between multiple storage accounts and partitions

                                                  When limit is hit app will see lsquo503 server busyrsquo applications should implement exponential backoff

                                                  Single Blob Partition

                                                  bull Throughput up to 60 MBs

                                                  PartitionKey (Category)

                                                  RowKey (Title)

                                                  Timestamp ReleaseDate

                                                  Action Fast amp Furious hellip 2009

                                                  Action The Bourne Ultimatum hellip 2007

                                                  hellip hellip hellip hellip

                                                  Animation Open Season 2 hellip 2009

                                                  Animation The Ant Bully hellip 2006

                                                  PartitionKey (Category)

                                                  RowKey (Title)

                                                  Timestamp ReleaseDate

                                                  Comedy Office Space hellip 1999

                                                  hellip hellip hellip hellip

                                                  SciFi X-Men Origins Wolverine hellip 2009

                                                  hellip hellip hellip hellip

                                                  War Defiance hellip 2008

                                                  PartitionKey (Category)

                                                  RowKey (Title)

                                                  Timestamp ReleaseDate

                                                  Action Fast amp Furious hellip 2009

                                                  Action The Bourne Ultimatum hellip 2007

                                                  hellip hellip hellip hellip

                                                  Animation Open Season 2 hellip 2009

                                                  Animation The Ant Bully hellip 2006

                                                  hellip hellip hellip hellip

                                                  Comedy Office Space hellip 1999

                                                  hellip hellip hellip hellip

                                                  SciFi X-Men Origins Wolverine hellip 2009

                                                  hellip hellip hellip hellip

                                                  War Defiance hellip 2008

                                                  Partitions and Partition Ranges

                                                  Key Selection Things to Consider

                                                  bullDistribute load as much as possible bullHot partitions can be load balanced bullPartitionKey is critical for scalability

                                                  See httpwwwmicrosoftpdccom2009SVC09 and httpazurescopecloudappnet for more information

                                                  bull Avoid frequent large scans bull Parallelize queries bull Point queries are most efficient

                                                  bullTransactions across a single partition bullTransaction semantics amp Reduce round trips

                                                  Scalability

                                                  Query Efficiency amp Speed

                                                  Entity group transactions

                                                  Expect Continuation Tokens ndash Seriously

                                                  Maximum of 1000 rows in a response

                                                  At the end of partition range boundary

                                                  Maximum of 1000 rows in a response

                                                  At the end of partition range boundary

                                                  Maximum of 5 seconds to execute the query

                                                  Tables Recap bullEfficient for frequently used queries

                                                  bullSupports batch transactions

                                                  bullDistributes load

                                                  Select PartitionKey and RowKey that help scale

                                                  Avoid ldquoAppend onlyrdquo patterns

                                                  Always Handle continuation tokens

                                                  ldquoORrdquo predicates are not optimized

                                                  Implement back-off strategy for retries

                                                  bullDistribute by using a hash etc as prefix

                                                  bullExpect continuation tokens for range queries

                                                  bullExecute the queries that form the ldquoORrdquo predicates as separate queries

                                                  bullServer busy

                                                  bullLoad balance partitions to meet traffic needs

                                                  bullLoad on single partition has exceeded the limits

                                                  WCF Data Services

                                                  bullUse a new context for each logical operation

                                                  bullAddObjectAttachTo can throw exception if entity is already being tracked

                                                  bullPoint query throws an exception if resource does not exist Use IgnoreResourceNotFoundException

                                                  Queues Their Unique Role in Building Reliable Scalable Applications

                                                  bull Want roles that work closely together but are not bound together bull Tight coupling leads to brittleness

                                                  bull This can aid in scaling and performance

                                                  bull A queue can hold an unlimited number of messages bull Messages must be serializable as XML

                                                  bull Limited to 8KB in size

                                                  bull Commonly use the work ticket pattern

                                                  bull Why not simply use a table

                                                  Queue Terminology

                                                  Message Lifecycle

                                                  Queue

                                                  Msg 1

                                                  Msg 2

                                                  Msg 3

                                                  Msg 4

                                                  Worker Role

                                                  Worker Role

                                                  PutMessage

                                                  Web Role

                                                  GetMessage (Timeout) RemoveMessage

                                                  Msg 2 Msg 1

                                                  Worker Role

                                                  Msg 2

                                                  POST httpmyaccountqueuecorewindowsnetmyqueuemessages

                                                  HTTP11 200 OK Transfer-Encoding chunked Content-Type applicationxml Date Tue 09 Dec 2008 210430 GMT Server Nephos Queue Service Version 10 Microsoft-HTTPAPI20

                                                  ltxml version=10 encoding=utf-8gt ltQueueMessagesListgt ltQueueMessagegt ltMessageIdgt5974b586-0df3-4e2d-ad0c-18e3892bfca2ltMessageIdgt ltInsertionTimegtMon 22 Sep 2008 232920 GMTltInsertionTimegt ltExpirationTimegtMon 29 Sep 2008 232920 GMTltExpirationTimegt ltPopReceiptgtYzQ4Yzg1MDIGM0MDFiZDAwYzEwltPopReceiptgt ltTimeNextVisiblegtTue 23 Sep 2008 052920GMTltTimeNextVisiblegt ltMessageTextgtPHRlc3Q+dGdGVzdD4=ltMessageTextgt ltQueueMessagegt ltQueueMessagesListgt

                                                  DELETE httpmyaccountqueuecorewindowsnetmyqueuemessagesmessageidpopreceipt=YzQ4Yzg1MDIGM0MDFiZDAwYzEw

                                                  Truncated Exponential Back Off Polling

                                                  Consider a backoff polling approach Each empty poll

                                                  increases interval by 2x

                                                  A successful sets the interval back to 1

                                                  44

                                                  2 1

                                                  1 1

                                                  C1

                                                  C2

                                                  Removing Poison Messages

                                                  1 1

                                                  2 1

                                                  3 4 0

                                                  Producers Consumers

                                                  P2

                                                  P1

                                                  3 0

                                                  2 GetMessage(Q 30 s) msg 2

                                                  1 GetMessage(Q 30 s) msg 1

                                                  1 1

                                                  2 1

                                                  1 0

                                                  2 0

                                                  45

                                                  C1

                                                  C2

                                                  Removing Poison Messages

                                                  3 4 0

                                                  Producers Consumers

                                                  P2

                                                  P1

                                                  1 1

                                                  2 1

                                                  2 GetMessage(Q 30 s) msg 2 3 C2 consumed msg 2 4 DeleteMessage(Q msg 2) 7 GetMessage(Q 30 s) msg 1

                                                  1 GetMessage(Q 30 s) msg 1 5 C1 crashed

                                                  1 1

                                                  2 1

                                                  6 msg1 visible 30 s after Dequeue 3 0

                                                  1 2

                                                  1 1

                                                  1 2

                                                  46

                                                  C1

                                                  C2

                                                  Removing Poison Messages

                                                  3 4 0

                                                  Producers Consumers

                                                  P2

                                                  P1

                                                  1 2

                                                  2 Dequeue(Q 30 sec) msg 2 3 C2 consumed msg 2 4 Delete(Q msg 2) 7 Dequeue(Q 30 sec) msg 1 8 C2 crashed

                                                  1 Dequeue(Q 30 sec) msg 1 5 C1 crashed 10 C1 restarted 11 Dequeue(Q 30 sec) msg 1 12 DequeueCount gt 2 13 Delete (Q msg1) 1

                                                  2

                                                  6 msg1 visible 30s after Dequeue 9 msg1 visible 30s after Dequeue

                                                  3 0

                                                  1 3

                                                  1 2

                                                  1 3

                                                  Queues Recap

                                                  bullNo need to deal with failures Make message

                                                  processing idempotent

                                                  bullInvisible messages result in out of order Do not rely on order

                                                  bullEnforce threshold on messagersquos dequeue count Use Dequeue count to remove

                                                  poison messages

                                                  bullMessages gt 8KB

                                                  bullBatch messages

                                                  bullGarbage collect orphaned blobs

                                                  bullDynamically increasereduce workers

                                                  Use blob to store message data with

                                                  reference in message

                                                  Use message count to scale

                                                  bullNo need to deal with failures

                                                  bullInvisible messages result in out of order

                                                  bullEnforce threshold on messagersquos dequeue count

                                                  bullDynamically increasereduce workers

                                                  Windows Azure Storage Takeaways

                                                  Blobs

                                                  Drives

                                                  Tables

                                                  Queues

                                                  httpblogsmsdncomwindowsazurestorage

                                                  httpazurescopecloudappnet

                                                  49

                                                  A Quick Exercise

                                                  hellipThen letrsquos look at some code and some tools

                                                  50

                                                  Code ndash AccountInformationcs public class AccountInformation private static string storageKey = ldquotHiSiSnOtMyKeY private static string accountName = jjstore private static StorageCredentialsAccountAndKey credentials internal static StorageCredentialsAccountAndKey Credentials get if (credentials == null) credentials = new StorageCredentialsAccountAndKey(accountName storageKey) return credentials

                                                  51

                                                  Code ndash BlobHelpercs public class BlobHelper private static string defaultContainerName = school private CloudBlobClient client = null private CloudBlobContainer container = null private void InitContainer() if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudBlobClient() container = clientGetContainerReference(defaultContainerName) containerCreateIfNotExist() BlobContainerPermissions permissions = containerGetPermissions() permissionsPublicAccess = BlobContainerPublicAccessTypeContainer containerSetPermissions(permissions)

                                                  52

                                                  Code ndash BlobHelpercs

                                                  public void WriteFileToBlob(string filePath) if (client == null || container == null) InitContainer() FileInfo file = new FileInfo(filePath) CloudBlob blob = containerGetBlobReference(fileName) blobPropertiesContentType = GetContentType(fileExtension) blobUploadFile(fileFullName) Or if you want to write a string replace the last line with blobUploadText(someString) And make sure you set the content type to the appropriate MIME type (eg ldquotextplainrdquo)

                                                  53

                                                  Code ndash BlobHelpercs

                                                  public string GetBlobText(string blobName) if (client == null || container == null) InitContainer() CloudBlob blob = containerGetBlobReference(blobName) try return blobDownloadText() catch (Exception) The blob probably does not exist or there is no connection available return null

                                                  54

                                                  Application Code - Blobs private void SaveToCloudButton_Click(object sender RoutedEventArgs e) StringBuilder buff = new StringBuilder() buffAppendLine(LastNameFirstNameEmailBirthdayNativeLanguageFavoriteIceCreamYearsInPhDGraduated) foreach (AttendeeEntity attendee in attendees) buffAppendLine(attendeeToCsvString()) blobHelperWriteStringToBlob(SummerSchoolAttendeestxt buffToString())

                                                  The blob is now available at httpltAccountNamegtblobcorewindowsnetltContainerNamegtltBlobNamegt Or in this case httpjjstoreblobcorewindowsnetschoolSummerSchoolAttendeestxt

                                                  55

                                                  Code - TableEntities using MicrosoftWindowsAzureStorageClient public class AttendeeEntity TableServiceEntity public string FirstName get set public string LastName get set public string Email get set public DateTime Birthday get set public string FavoriteIceCream get set public int YearsInPhD get set public bool Graduated get set hellip

                                                  56

                                                  Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

                                                  57

                                                  Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

                                                  58

                                                  Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

                                                  59

                                                  Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

                                                  60

                                                  Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

                                                  Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

                                                  61

                                                  Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

                                                  62

                                                  Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

                                                  Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

                                                  63

                                                  Tools ndash Fiddler2

                                                  Best Practices

                                                  Picking the Right VM Size

                                                  bull Having the correct VM size can make a big difference in costs

                                                  bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                                                  bull If you scale better than linear across cores larger VMs could save you money

                                                  bull Pretty rare to see linear scaling across 8 cores

                                                  bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                                                  bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                                                  Using Your VM to the Maximum

                                                  Remember bull 1 role instance == 1 VM running Windows

                                                  bull 1 role instance = one specific task for your code

                                                  bull Yoursquore paying for the entire VM so why not use it

                                                  bull Common mistake ndash split up code into multiple roles each not using up CPU

                                                  bull Balance between using up CPU vs having free capacity in times of need

                                                  bull Multiple ways to use your CPU to the fullest

                                                  Exploiting Concurrency

                                                  bull Spin up additional processes each with a specific task or as a unit of concurrency

                                                  bull May not be ideal if number of active processes exceeds number of cores

                                                  bull Use multithreading aggressively

                                                  bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                                                  bull In NET 4 use the Task Parallel Library

                                                  bull Data parallelism

                                                  bull Task parallelism

                                                  Finding Good Code Neighbors

                                                  bull Typically code falls into one or more of these categories

                                                  bull Find code that is intensive with different resources to live together

                                                  bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                                                  Memory Intensive

                                                  CPU Intensive

                                                  Network IO Intensive

                                                  Storage IO Intensive

                                                  Scaling Appropriately

                                                  bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                                                  bull Spinning VMs up and down automatically is good at large scale

                                                  bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                                                  bull Being too aggressive in spinning down VMs can result in poor user experience

                                                  bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                                                  Performance Cost

                                                  Storage Costs

                                                  bull Understand an applicationrsquos storage profile and how storage billing works

                                                  bull Make service choices based on your app profile

                                                  bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                                                  bull Service choice can make a big cost difference based on your app profile

                                                  bull Caching and compressing They help a lot with storage costs

                                                  Saving Bandwidth Costs

                                                  Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                                  Sending fewer things over the wire often means getting fewer things from storage

                                                  Saving bandwidth costs often lead to savings in other places

                                                  Sending fewer things means your VM has time to do other tasks

                                                  All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                                  Compressing Content

                                                  1 Gzip all output content

                                                  bull All modern browsers can decompress on the fly

                                                  bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                                  2Tradeoff compute costs for storage size

                                                  3Minimize image sizes

                                                  bull Use Portable Network Graphics (PNGs)

                                                  bull Crush your PNGs

                                                  bull Strip needless metadata

                                                  bull Make all PNGs palette PNGs

                                                  Uncompressed

                                                  Content

                                                  Compressed

                                                  Content

                                                  Gzip

                                                  Minify JavaScript

                                                  Minify CCS

                                                  Minify Images

                                                  Best Practices Summary

                                                  Doing lsquolessrsquo is the key to saving costs

                                                  Measure everything

                                                  Know your application profile in and out

                                                  Research Examples in the Cloud

                                                  hellipon another set of slides

                                                  Map Reduce on Azure

                                                  bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                                  bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                                  76

                                                  Project Daytona - Map Reduce on Azure

                                                  httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                                  77

                                                  Questions and Discussionhellip

                                                  Thank you for hosting me at the Summer School

                                                  BLAST (Basic Local Alignment Search Tool)

                                                  bull The most important software in bioinformatics

                                                  bull Identify similarity between bio-sequences

                                                  Computationally intensive

                                                  bull Large number of pairwise alignment operations

                                                  bull A BLAST running can take 700 ~ 1000 CPU hours

                                                  bull Sequence databases growing exponentially

                                                  bull GenBank doubled in size in about 15 months

                                                  It is easy to parallelize BLAST

                                                  bull Segment the input

                                                  bull Segment processing (querying) is pleasingly parallel

                                                  bull Segment the database (eg mpiBLAST)

                                                  bull Needs special result reduction processing

                                                  Large volume data

                                                  bull A normal Blast database can be as large as 10GB

                                                  bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                  bull The output of BLAST is usually 10-100x larger than the input

                                                  bull Parallel BLAST engine on Azure

                                                  bull Query-segmentation data-parallel pattern bull split the input sequences

                                                  bull query partitions in parallel

                                                  bull merge results together when done

                                                  bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                  bull With three special considerations bull Batch job management

                                                  bull Task parallelism on an elastic Cloud

                                                  Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                  on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                  A simple SplitJoin pattern

                                                  Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                  bull 1248 for small middle large and extra large instance size

                                                  Task granularity bull Large partition load imbalance

                                                  bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                  bull Data transferring overhead

                                                  Best Practice test runs to profiling and set size to mitigate the overhead

                                                  Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                  bull too small repeated computation

                                                  bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                  bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                  bull Watch out for the 2-hour maximum limitation

                                                  BLAST task

                                                  Splitting task

                                                  BLAST task

                                                  BLAST task

                                                  BLAST task

                                                  hellip

                                                  Merging Task

                                                  Task size vs Performance

                                                  bull Benefit of the warm cache effect

                                                  bull 100 sequences per partition is the best choice

                                                  Instance size vs Performance

                                                  bull Super-linear speedup with larger size worker instances

                                                  bull Primarily due to the memory capability

                                                  Task SizeInstance Size vs Cost

                                                  bull Extra-large instance generated the best and the most economical throughput

                                                  bull Fully utilize the resource

                                                  Web

                                                  Portal

                                                  Web

                                                  Service

                                                  Job registration

                                                  Job Scheduler

                                                  Worker

                                                  Worker

                                                  Worker

                                                  Global

                                                  dispatch

                                                  queue

                                                  Web Role

                                                  Azure Table

                                                  Job Management Role

                                                  Azure Blob

                                                  Database

                                                  updating Role

                                                  hellip

                                                  Scaling Engine

                                                  Blast databases

                                                  temporary data etc)

                                                  Job Registry NCBI databases

                                                  BLAST task

                                                  Splitting task

                                                  BLAST task

                                                  BLAST task

                                                  BLAST task

                                                  hellip

                                                  Merging Task

                                                  ASPNET program hosted by a web role instance bull Submit jobs

                                                  bull Track jobrsquos status and logs

                                                  AuthenticationAuthorization based on Live ID

                                                  The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                  Web Portal

                                                  Web Service

                                                  Job registration

                                                  Job Scheduler

                                                  Job Portal

                                                  Scaling Engine

                                                  Job Registry

                                                  R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                  Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                  bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                  AzureBLAST significantly saved computing timehellip

                                                  Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                  ldquoAll against Allrdquo query bull The database is also the input query

                                                  bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                  bull Theoretically 100 billion sequence comparisons

                                                  Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                  bull Would require 3216731 minutes (61 years) on one desktop

                                                  One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                  bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                  bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                  bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                  bull Each segment consists of smaller partitions

                                                  bull When load imbalances redistribute the load manually

                                                  5

                                                  0

                                                  62 6

                                                  2 6

                                                  2 6

                                                  2 6

                                                  2 5

                                                  0 62

                                                  bull Total size of the output result is ~230GB

                                                  bull The number of total hits is 1764579487

                                                  bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                  bull But based our estimates real working instance time should be 6~8 day

                                                  bull Look into log data to analyze what took placehellip

                                                  5

                                                  0

                                                  62 6

                                                  2 6

                                                  2 6

                                                  2 6

                                                  2 5

                                                  0 62

                                                  A normal log record should be

                                                  Otherwise something is wrong (eg task failed to complete)

                                                  3312010 614 RD00155D3611B0 Executing the task 251523

                                                  3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                  3312010 625 RD00155D3611B0 Executing the task 251553

                                                  3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                  3312010 644 RD00155D3611B0 Executing the task 251600

                                                  3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                  3312010 822 RD00155D3611B0 Executing the task 251774

                                                  3312010 950 RD00155D3611B0 Executing the task 251895

                                                  3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                  North Europe Data Center totally 34256 tasks processed

                                                  All 62 compute nodes lost tasks

                                                  and then came back in a group

                                                  This is an Update domain

                                                  ~30 mins

                                                  ~ 6 nodes in one group

                                                  35 Nodes experience blob

                                                  writing failure at same

                                                  time

                                                  West Europe Datacenter 30976 tasks are completed and job was killed

                                                  A reasonable guess the

                                                  Fault Domain is working

                                                  MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                  You never miss the water till the well has run dry Irish Proverb

                                                  ET = Water volume evapotranspired (m3 s-1 m-2)

                                                  Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                  λv = Latent heat of vaporization (Jg)

                                                  Rn = Net radiation (W m-2)

                                                  cp = Specific heat capacity of air (J kg-1 K-1)

                                                  ρa = dry air density (kg m-3)

                                                  δq = vapor pressure deficit (Pa)

                                                  ga = Conductivity of air (inverse of ra) (m s-1)

                                                  gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                  γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                  Estimating resistanceconductivity across a

                                                  catchment can be tricky

                                                  bull Lots of inputs big data reduction

                                                  bull Some of the inputs are not so simple

                                                  119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                  (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                  Penman-Monteith (1964)

                                                  Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                  NASA MODIS

                                                  imagery source

                                                  archives

                                                  5 TB (600K files)

                                                  FLUXNET curated

                                                  sensor dataset

                                                  (30GB 960 files)

                                                  FLUXNET curated

                                                  field dataset

                                                  2 KB (1 file)

                                                  NCEPNCAR

                                                  ~100MB

                                                  (4K files)

                                                  Vegetative

                                                  clumping

                                                  ~5MB (1file)

                                                  Climate

                                                  classification

                                                  ~1MB (1file)

                                                  20 US year = 1 global year

                                                  Data collection (map) stage

                                                  bull Downloads requested input tiles

                                                  from NASA ftp sites

                                                  bull Includes geospatial lookup for

                                                  non-sinusoidal tiles that will

                                                  contribute to a reprojected

                                                  sinusoidal tile

                                                  Reprojection (map) stage

                                                  bull Converts source tile(s) to

                                                  intermediate result sinusoidal tiles

                                                  bull Simple nearest neighbor or spline

                                                  algorithms

                                                  Derivation reduction stage

                                                  bull First stage visible to scientist

                                                  bull Computes ET in our initial use

                                                  Analysis reduction stage

                                                  bull Optional second stage visible to

                                                  scientist

                                                  bull Enables production of science

                                                  analysis artifacts such as maps

                                                  tables virtual sensors

                                                  Reduction 1

                                                  Queue

                                                  Source

                                                  Metadata

                                                  AzureMODIS

                                                  Service Web Role Portal

                                                  Request

                                                  Queue

                                                  Scientific

                                                  Results

                                                  Download

                                                  Data Collection Stage

                                                  Source Imagery Download Sites

                                                  Reprojection

                                                  Queue

                                                  Reduction 2

                                                  Queue

                                                  Download

                                                  Queue

                                                  Scientists

                                                  Science results

                                                  Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                  httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                  bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                  bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                  bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                  recoverable units of work

                                                  bull Execution status of all jobs and tasks persisted in Tables

                                                  ltPipelineStagegt

                                                  Request

                                                  hellip ltPipelineStagegtJobStatus

                                                  Persist ltPipelineStagegtJob Queue

                                                  MODISAzure Service

                                                  (Web Role)

                                                  Service Monitor

                                                  (Worker Role)

                                                  Parse amp Persist ltPipelineStagegtTaskStatus

                                                  hellip

                                                  Dispatch

                                                  ltPipelineStagegtTask Queue

                                                  All work actually done by a Worker Role

                                                  bull Sandboxes science or other executable

                                                  bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                  Service Monitor

                                                  (Worker Role)

                                                  Parse amp Persist ltPipelineStagegtTaskStatus

                                                  GenericWorker

                                                  (Worker Role)

                                                  hellip

                                                  hellip

                                                  Dispatch

                                                  ltPipelineStagegtTask Queue

                                                  hellip

                                                  ltInputgtData Storage

                                                  bull Dequeues tasks created by the Service Monitor

                                                  bull Retries failed tasks 3 times

                                                  bull Maintains all task status

                                                  Reprojection Request

                                                  hellip

                                                  Service Monitor

                                                  (Worker Role)

                                                  ReprojectionJobStatus Persist

                                                  Parse amp Persist ReprojectionTaskStatus

                                                  GenericWorker

                                                  (Worker Role)

                                                  hellip

                                                  Job Queue

                                                  hellip

                                                  Dispatch

                                                  Task Queue

                                                  Points to

                                                  hellip

                                                  ScanTimeList

                                                  SwathGranuleMeta

                                                  Reprojection Data

                                                  Storage

                                                  Each entity specifies a

                                                  single reprojection job

                                                  request

                                                  Each entity specifies a

                                                  single reprojection task (ie

                                                  a single tile)

                                                  Query this table to get

                                                  geo-metadata (eg

                                                  boundaries) for each swath

                                                  tile

                                                  Query this table to get the

                                                  list of satellite scan times

                                                  that cover a target tile

                                                  Swath Source

                                                  Data Storage

                                                  bull Computational costs driven by data scale and need to run reduction multiple times

                                                  bull Storage costs driven by data scale and 6 month project duration

                                                  bull Small with respect to the people costs even at graduate student rates

                                                  Reduction 1 Queue

                                                  Source Metadata

                                                  Request Queue

                                                  Scientific Results Download

                                                  Data Collection Stage

                                                  Source Imagery Download Sites

                                                  Reprojection Queue

                                                  Reduction 2 Queue

                                                  Download

                                                  Queue

                                                  Scientists

                                                  Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                  400-500 GB

                                                  60K files

                                                  10 MBsec

                                                  11 hours

                                                  lt10 workers

                                                  $50 upload

                                                  $450 storage

                                                  400 GB

                                                  45K files

                                                  3500 hours

                                                  20-100

                                                  workers

                                                  5-7 GB

                                                  55K files

                                                  1800 hours

                                                  20-100

                                                  workers

                                                  lt10 GB

                                                  ~1K files

                                                  1800 hours

                                                  20-100

                                                  workers

                                                  $420 cpu

                                                  $60 download

                                                  $216 cpu

                                                  $1 download

                                                  $6 storage

                                                  $216 cpu

                                                  $2 download

                                                  $9 storage

                                                  AzureMODIS

                                                  Service Web Role Portal

                                                  Total $1420

                                                  bull Clouds are the largest scale computer centers ever constructed and have the

                                                  potential to be important to both large and small scale science problems

                                                  bull Equally import they can increase participation in research providing needed

                                                  resources to userscommunities without ready access

                                                  bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                  support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                  latency applications do not perform optimally on clouds today

                                                  bull Provide valuable fault tolerance and scalability abstractions

                                                  bull Clouds as amplifier for familiar client tools and on premise compute

                                                  bull Clouds services to support research provide considerable leverage for both

                                                  individual researchers and entire communities of researchers

                                                  • Day 2 - Azure as a PaaS
                                                  • Day 2 - Applications

                                                    Storage

                                                    Durable Storage At Massive Scale

                                                    Blob

                                                    - Massive files eg videos logs

                                                    Drive

                                                    - Use standard file system APIs

                                                    Tables - Non-relational but with few scale limits

                                                    - Use SQL Azure for relational data

                                                    Queues

                                                    - Facilitate loosely-coupled reliable systems

                                                    Blob Features and Functions

                                                    bull Store Large Objects (up to 1TB in size)

                                                    bull Can be served through Windows Azure CDN service

                                                    bull Standard REST Interface

                                                    bull PutBlob

                                                    bull Inserts a new blob overwrites the existing blob

                                                    bull GetBlob

                                                    bull Get whole blob or a specific range

                                                    bull DeleteBlob

                                                    bull CopyBlob

                                                    bull SnapshotBlob

                                                    bull LeaseBlob

                                                    Two Types of Blobs Under the Hood

                                                    bull Block Blob

                                                    bull Targeted at streaming workloads

                                                    bull Each blob consists of a sequence of blocks bull Each block is identified by a Block ID

                                                    bull Size limit 200GB per blob

                                                    bull Page Blob

                                                    bull Targeted at random readwrite workloads

                                                    bull Each blob consists of an array of pages bull Each page is identified by its offset

                                                    from the start of the blob

                                                    bull Size limit 1TB per blob

                                                    Windows Azure Drive

                                                    bull Provides a durable NTFS volume for Windows Azure applications to use

                                                    bull Use existing NTFS APIs to access a durable drive bull Durability and survival of data on application failover

                                                    bull Enables migrating existing NTFS applications to the cloud

                                                    bull A Windows Azure Drive is a Page Blob

                                                    bull Example mount Page Blob as X bull httpltaccountnamegtblobcorewindowsnetltcontainernamegtltblobnamegt

                                                    bull All writes to drive are made durable to the Page Blob bull Drive made durable through standard Page Blob replication

                                                    bull Drive persists even when not mounted as a Page Blob

                                                    Windows Azure Tables

                                                    bull Provides Structured Storage

                                                    bull Massively Scalable Tables bull Billions of entities (rows) and TBs of data

                                                    bull Can use thousands of servers as traffic grows

                                                    bull Highly Available amp Durable bull Data is replicated several times

                                                    bull Familiar and Easy to use API

                                                    bull WCF Data Services and OData bull NET classes and LINQ

                                                    bull REST ndash with any platform or language

                                                    Windows Azure Queues

                                                    bull Queue are performance efficient highly available and provide reliable message delivery

                                                    bull Simple asynchronous work dispatch

                                                    bull Programming semantics ensure that a message can be processed at least once

                                                    bull Access is provided via REST

                                                    Storage Partitioning

                                                    Understanding partitioning is key to understanding performance

                                                    bull Different for each data type (blobs entities queues) Every data object has a partition key

                                                    bull A partition can be served by a single server

                                                    bull System load balances partitions based on traffic pattern

                                                    bull Controls entity locality Partition key is unit of scale

                                                    bull Load balancing can take a few minutes to kick in

                                                    bull Can take a couple of seconds for partition to be available on a different server

                                                    System load balances

                                                    bull Use exponential backoff on ldquoServer Busyrdquo

                                                    bull Our system load balances to meet your traffic needs

                                                    bull Single partition limits have been reached Server Busy

                                                    Partition Keys In Each Abstraction

                                                    bull Entities w same PartitionKey value served from same partition Entities ndash TableName + PartitionKey

                                                    PartitionKey (CustomerId) RowKey (RowKind)

                                                    Name CreditCardNumber OrderTotal

                                                    1 Customer-John Smith John Smith xxxx-xxxx-xxxx-xxxx

                                                    1 Order ndash 1 $3512

                                                    2 Customer-Bill Johnson Bill Johnson xxxx-xxxx-xxxx-xxxx

                                                    2 Order ndash 3 $1000

                                                    bull Every blob and its snapshots are in a single partition Blobs ndash Container name + Blob name

                                                    bullAll messages for a single queue belong to the same partition Messages ndash Queue Name

                                                    Container Name Blob Name

                                                    image annarborbighousejpg

                                                    image foxboroughgillettejpg

                                                    video annarborbighousejpg

                                                    Queue Message

                                                    jobs Message1

                                                    jobs Message2

                                                    workflow Message1

                                                    Scalability Targets

                                                    Storage Account

                                                    bull Capacity ndash Up to 100 TBs

                                                    bull Transactions ndash Up to a few thousand requests per second

                                                    bull Bandwidth ndash Up to a few hundred megabytes per second

                                                    Single QueueTable Partition

                                                    bull Up to 500 transactions per second

                                                    To go above these numbers partition between multiple storage accounts and partitions

                                                    When limit is hit app will see lsquo503 server busyrsquo applications should implement exponential backoff

                                                    Single Blob Partition

                                                    bull Throughput up to 60 MBs

                                                    PartitionKey (Category)

                                                    RowKey (Title)

                                                    Timestamp ReleaseDate

                                                    Action Fast amp Furious hellip 2009

                                                    Action The Bourne Ultimatum hellip 2007

                                                    hellip hellip hellip hellip

                                                    Animation Open Season 2 hellip 2009

                                                    Animation The Ant Bully hellip 2006

                                                    PartitionKey (Category)

                                                    RowKey (Title)

                                                    Timestamp ReleaseDate

                                                    Comedy Office Space hellip 1999

                                                    hellip hellip hellip hellip

                                                    SciFi X-Men Origins Wolverine hellip 2009

                                                    hellip hellip hellip hellip

                                                    War Defiance hellip 2008

                                                    PartitionKey (Category)

                                                    RowKey (Title)

                                                    Timestamp ReleaseDate

                                                    Action Fast amp Furious hellip 2009

                                                    Action The Bourne Ultimatum hellip 2007

                                                    hellip hellip hellip hellip

                                                    Animation Open Season 2 hellip 2009

                                                    Animation The Ant Bully hellip 2006

                                                    hellip hellip hellip hellip

                                                    Comedy Office Space hellip 1999

                                                    hellip hellip hellip hellip

                                                    SciFi X-Men Origins Wolverine hellip 2009

                                                    hellip hellip hellip hellip

                                                    War Defiance hellip 2008

                                                    Partitions and Partition Ranges

                                                    Key Selection Things to Consider

                                                    bullDistribute load as much as possible bullHot partitions can be load balanced bullPartitionKey is critical for scalability

                                                    See httpwwwmicrosoftpdccom2009SVC09 and httpazurescopecloudappnet for more information

                                                    bull Avoid frequent large scans bull Parallelize queries bull Point queries are most efficient

                                                    bullTransactions across a single partition bullTransaction semantics amp Reduce round trips

                                                    Scalability

                                                    Query Efficiency amp Speed

                                                    Entity group transactions

                                                    Expect Continuation Tokens ndash Seriously

                                                    Maximum of 1000 rows in a response

                                                    At the end of partition range boundary

                                                    Maximum of 1000 rows in a response

                                                    At the end of partition range boundary

                                                    Maximum of 5 seconds to execute the query

                                                    Tables Recap bullEfficient for frequently used queries

                                                    bullSupports batch transactions

                                                    bullDistributes load

                                                    Select PartitionKey and RowKey that help scale

                                                    Avoid ldquoAppend onlyrdquo patterns

                                                    Always Handle continuation tokens

                                                    ldquoORrdquo predicates are not optimized

                                                    Implement back-off strategy for retries

                                                    bullDistribute by using a hash etc as prefix

                                                    bullExpect continuation tokens for range queries

                                                    bullExecute the queries that form the ldquoORrdquo predicates as separate queries

                                                    bullServer busy

                                                    bullLoad balance partitions to meet traffic needs

                                                    bullLoad on single partition has exceeded the limits

                                                    WCF Data Services

                                                    bullUse a new context for each logical operation

                                                    bullAddObjectAttachTo can throw exception if entity is already being tracked

                                                    bullPoint query throws an exception if resource does not exist Use IgnoreResourceNotFoundException

                                                    Queues Their Unique Role in Building Reliable Scalable Applications

                                                    bull Want roles that work closely together but are not bound together bull Tight coupling leads to brittleness

                                                    bull This can aid in scaling and performance

                                                    bull A queue can hold an unlimited number of messages bull Messages must be serializable as XML

                                                    bull Limited to 8KB in size

                                                    bull Commonly use the work ticket pattern

                                                    bull Why not simply use a table

                                                    Queue Terminology

                                                    Message Lifecycle

                                                    Queue

                                                    Msg 1

                                                    Msg 2

                                                    Msg 3

                                                    Msg 4

                                                    Worker Role

                                                    Worker Role

                                                    PutMessage

                                                    Web Role

                                                    GetMessage (Timeout) RemoveMessage

                                                    Msg 2 Msg 1

                                                    Worker Role

                                                    Msg 2

                                                    POST httpmyaccountqueuecorewindowsnetmyqueuemessages

                                                    HTTP11 200 OK Transfer-Encoding chunked Content-Type applicationxml Date Tue 09 Dec 2008 210430 GMT Server Nephos Queue Service Version 10 Microsoft-HTTPAPI20

                                                    ltxml version=10 encoding=utf-8gt ltQueueMessagesListgt ltQueueMessagegt ltMessageIdgt5974b586-0df3-4e2d-ad0c-18e3892bfca2ltMessageIdgt ltInsertionTimegtMon 22 Sep 2008 232920 GMTltInsertionTimegt ltExpirationTimegtMon 29 Sep 2008 232920 GMTltExpirationTimegt ltPopReceiptgtYzQ4Yzg1MDIGM0MDFiZDAwYzEwltPopReceiptgt ltTimeNextVisiblegtTue 23 Sep 2008 052920GMTltTimeNextVisiblegt ltMessageTextgtPHRlc3Q+dGdGVzdD4=ltMessageTextgt ltQueueMessagegt ltQueueMessagesListgt

                                                    DELETE httpmyaccountqueuecorewindowsnetmyqueuemessagesmessageidpopreceipt=YzQ4Yzg1MDIGM0MDFiZDAwYzEw

                                                    Truncated Exponential Back Off Polling

                                                    Consider a backoff polling approach Each empty poll

                                                    increases interval by 2x

                                                    A successful sets the interval back to 1

                                                    44

                                                    2 1

                                                    1 1

                                                    C1

                                                    C2

                                                    Removing Poison Messages

                                                    1 1

                                                    2 1

                                                    3 4 0

                                                    Producers Consumers

                                                    P2

                                                    P1

                                                    3 0

                                                    2 GetMessage(Q 30 s) msg 2

                                                    1 GetMessage(Q 30 s) msg 1

                                                    1 1

                                                    2 1

                                                    1 0

                                                    2 0

                                                    45

                                                    C1

                                                    C2

                                                    Removing Poison Messages

                                                    3 4 0

                                                    Producers Consumers

                                                    P2

                                                    P1

                                                    1 1

                                                    2 1

                                                    2 GetMessage(Q 30 s) msg 2 3 C2 consumed msg 2 4 DeleteMessage(Q msg 2) 7 GetMessage(Q 30 s) msg 1

                                                    1 GetMessage(Q 30 s) msg 1 5 C1 crashed

                                                    1 1

                                                    2 1

                                                    6 msg1 visible 30 s after Dequeue 3 0

                                                    1 2

                                                    1 1

                                                    1 2

                                                    46

                                                    C1

                                                    C2

                                                    Removing Poison Messages

                                                    3 4 0

                                                    Producers Consumers

                                                    P2

                                                    P1

                                                    1 2

                                                    2 Dequeue(Q 30 sec) msg 2 3 C2 consumed msg 2 4 Delete(Q msg 2) 7 Dequeue(Q 30 sec) msg 1 8 C2 crashed

                                                    1 Dequeue(Q 30 sec) msg 1 5 C1 crashed 10 C1 restarted 11 Dequeue(Q 30 sec) msg 1 12 DequeueCount gt 2 13 Delete (Q msg1) 1

                                                    2

                                                    6 msg1 visible 30s after Dequeue 9 msg1 visible 30s after Dequeue

                                                    3 0

                                                    1 3

                                                    1 2

                                                    1 3

                                                    Queues Recap

                                                    bullNo need to deal with failures Make message

                                                    processing idempotent

                                                    bullInvisible messages result in out of order Do not rely on order

                                                    bullEnforce threshold on messagersquos dequeue count Use Dequeue count to remove

                                                    poison messages

                                                    bullMessages gt 8KB

                                                    bullBatch messages

                                                    bullGarbage collect orphaned blobs

                                                    bullDynamically increasereduce workers

                                                    Use blob to store message data with

                                                    reference in message

                                                    Use message count to scale

                                                    bullNo need to deal with failures

                                                    bullInvisible messages result in out of order

                                                    bullEnforce threshold on messagersquos dequeue count

                                                    bullDynamically increasereduce workers

                                                    Windows Azure Storage Takeaways

                                                    Blobs

                                                    Drives

                                                    Tables

                                                    Queues

                                                    httpblogsmsdncomwindowsazurestorage

                                                    httpazurescopecloudappnet

                                                    49

                                                    A Quick Exercise

                                                    hellipThen letrsquos look at some code and some tools

                                                    50

                                                    Code ndash AccountInformationcs public class AccountInformation private static string storageKey = ldquotHiSiSnOtMyKeY private static string accountName = jjstore private static StorageCredentialsAccountAndKey credentials internal static StorageCredentialsAccountAndKey Credentials get if (credentials == null) credentials = new StorageCredentialsAccountAndKey(accountName storageKey) return credentials

                                                    51

                                                    Code ndash BlobHelpercs public class BlobHelper private static string defaultContainerName = school private CloudBlobClient client = null private CloudBlobContainer container = null private void InitContainer() if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudBlobClient() container = clientGetContainerReference(defaultContainerName) containerCreateIfNotExist() BlobContainerPermissions permissions = containerGetPermissions() permissionsPublicAccess = BlobContainerPublicAccessTypeContainer containerSetPermissions(permissions)

                                                    52

                                                    Code ndash BlobHelpercs

                                                    public void WriteFileToBlob(string filePath) if (client == null || container == null) InitContainer() FileInfo file = new FileInfo(filePath) CloudBlob blob = containerGetBlobReference(fileName) blobPropertiesContentType = GetContentType(fileExtension) blobUploadFile(fileFullName) Or if you want to write a string replace the last line with blobUploadText(someString) And make sure you set the content type to the appropriate MIME type (eg ldquotextplainrdquo)

                                                    53

                                                    Code ndash BlobHelpercs

                                                    public string GetBlobText(string blobName) if (client == null || container == null) InitContainer() CloudBlob blob = containerGetBlobReference(blobName) try return blobDownloadText() catch (Exception) The blob probably does not exist or there is no connection available return null

                                                    54

                                                    Application Code - Blobs private void SaveToCloudButton_Click(object sender RoutedEventArgs e) StringBuilder buff = new StringBuilder() buffAppendLine(LastNameFirstNameEmailBirthdayNativeLanguageFavoriteIceCreamYearsInPhDGraduated) foreach (AttendeeEntity attendee in attendees) buffAppendLine(attendeeToCsvString()) blobHelperWriteStringToBlob(SummerSchoolAttendeestxt buffToString())

                                                    The blob is now available at httpltAccountNamegtblobcorewindowsnetltContainerNamegtltBlobNamegt Or in this case httpjjstoreblobcorewindowsnetschoolSummerSchoolAttendeestxt

                                                    55

                                                    Code - TableEntities using MicrosoftWindowsAzureStorageClient public class AttendeeEntity TableServiceEntity public string FirstName get set public string LastName get set public string Email get set public DateTime Birthday get set public string FavoriteIceCream get set public int YearsInPhD get set public bool Graduated get set hellip

                                                    56

                                                    Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

                                                    57

                                                    Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

                                                    58

                                                    Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

                                                    59

                                                    Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

                                                    60

                                                    Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

                                                    Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

                                                    61

                                                    Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

                                                    62

                                                    Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

                                                    Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

                                                    63

                                                    Tools ndash Fiddler2

                                                    Best Practices

                                                    Picking the Right VM Size

                                                    bull Having the correct VM size can make a big difference in costs

                                                    bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                                                    bull If you scale better than linear across cores larger VMs could save you money

                                                    bull Pretty rare to see linear scaling across 8 cores

                                                    bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                                                    bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                                                    Using Your VM to the Maximum

                                                    Remember bull 1 role instance == 1 VM running Windows

                                                    bull 1 role instance = one specific task for your code

                                                    bull Yoursquore paying for the entire VM so why not use it

                                                    bull Common mistake ndash split up code into multiple roles each not using up CPU

                                                    bull Balance between using up CPU vs having free capacity in times of need

                                                    bull Multiple ways to use your CPU to the fullest

                                                    Exploiting Concurrency

                                                    bull Spin up additional processes each with a specific task or as a unit of concurrency

                                                    bull May not be ideal if number of active processes exceeds number of cores

                                                    bull Use multithreading aggressively

                                                    bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                                                    bull In NET 4 use the Task Parallel Library

                                                    bull Data parallelism

                                                    bull Task parallelism

                                                    Finding Good Code Neighbors

                                                    bull Typically code falls into one or more of these categories

                                                    bull Find code that is intensive with different resources to live together

                                                    bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                                                    Memory Intensive

                                                    CPU Intensive

                                                    Network IO Intensive

                                                    Storage IO Intensive

                                                    Scaling Appropriately

                                                    bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                                                    bull Spinning VMs up and down automatically is good at large scale

                                                    bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                                                    bull Being too aggressive in spinning down VMs can result in poor user experience

                                                    bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                                                    Performance Cost

                                                    Storage Costs

                                                    bull Understand an applicationrsquos storage profile and how storage billing works

                                                    bull Make service choices based on your app profile

                                                    bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                                                    bull Service choice can make a big cost difference based on your app profile

                                                    bull Caching and compressing They help a lot with storage costs

                                                    Saving Bandwidth Costs

                                                    Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                                    Sending fewer things over the wire often means getting fewer things from storage

                                                    Saving bandwidth costs often lead to savings in other places

                                                    Sending fewer things means your VM has time to do other tasks

                                                    All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                                    Compressing Content

                                                    1 Gzip all output content

                                                    bull All modern browsers can decompress on the fly

                                                    bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                                    2Tradeoff compute costs for storage size

                                                    3Minimize image sizes

                                                    bull Use Portable Network Graphics (PNGs)

                                                    bull Crush your PNGs

                                                    bull Strip needless metadata

                                                    bull Make all PNGs palette PNGs

                                                    Uncompressed

                                                    Content

                                                    Compressed

                                                    Content

                                                    Gzip

                                                    Minify JavaScript

                                                    Minify CCS

                                                    Minify Images

                                                    Best Practices Summary

                                                    Doing lsquolessrsquo is the key to saving costs

                                                    Measure everything

                                                    Know your application profile in and out

                                                    Research Examples in the Cloud

                                                    hellipon another set of slides

                                                    Map Reduce on Azure

                                                    bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                                    bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                                    76

                                                    Project Daytona - Map Reduce on Azure

                                                    httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                                    77

                                                    Questions and Discussionhellip

                                                    Thank you for hosting me at the Summer School

                                                    BLAST (Basic Local Alignment Search Tool)

                                                    bull The most important software in bioinformatics

                                                    bull Identify similarity between bio-sequences

                                                    Computationally intensive

                                                    bull Large number of pairwise alignment operations

                                                    bull A BLAST running can take 700 ~ 1000 CPU hours

                                                    bull Sequence databases growing exponentially

                                                    bull GenBank doubled in size in about 15 months

                                                    It is easy to parallelize BLAST

                                                    bull Segment the input

                                                    bull Segment processing (querying) is pleasingly parallel

                                                    bull Segment the database (eg mpiBLAST)

                                                    bull Needs special result reduction processing

                                                    Large volume data

                                                    bull A normal Blast database can be as large as 10GB

                                                    bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                    bull The output of BLAST is usually 10-100x larger than the input

                                                    bull Parallel BLAST engine on Azure

                                                    bull Query-segmentation data-parallel pattern bull split the input sequences

                                                    bull query partitions in parallel

                                                    bull merge results together when done

                                                    bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                    bull With three special considerations bull Batch job management

                                                    bull Task parallelism on an elastic Cloud

                                                    Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                    on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                    A simple SplitJoin pattern

                                                    Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                    bull 1248 for small middle large and extra large instance size

                                                    Task granularity bull Large partition load imbalance

                                                    bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                    bull Data transferring overhead

                                                    Best Practice test runs to profiling and set size to mitigate the overhead

                                                    Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                    bull too small repeated computation

                                                    bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                    bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                    bull Watch out for the 2-hour maximum limitation

                                                    BLAST task

                                                    Splitting task

                                                    BLAST task

                                                    BLAST task

                                                    BLAST task

                                                    hellip

                                                    Merging Task

                                                    Task size vs Performance

                                                    bull Benefit of the warm cache effect

                                                    bull 100 sequences per partition is the best choice

                                                    Instance size vs Performance

                                                    bull Super-linear speedup with larger size worker instances

                                                    bull Primarily due to the memory capability

                                                    Task SizeInstance Size vs Cost

                                                    bull Extra-large instance generated the best and the most economical throughput

                                                    bull Fully utilize the resource

                                                    Web

                                                    Portal

                                                    Web

                                                    Service

                                                    Job registration

                                                    Job Scheduler

                                                    Worker

                                                    Worker

                                                    Worker

                                                    Global

                                                    dispatch

                                                    queue

                                                    Web Role

                                                    Azure Table

                                                    Job Management Role

                                                    Azure Blob

                                                    Database

                                                    updating Role

                                                    hellip

                                                    Scaling Engine

                                                    Blast databases

                                                    temporary data etc)

                                                    Job Registry NCBI databases

                                                    BLAST task

                                                    Splitting task

                                                    BLAST task

                                                    BLAST task

                                                    BLAST task

                                                    hellip

                                                    Merging Task

                                                    ASPNET program hosted by a web role instance bull Submit jobs

                                                    bull Track jobrsquos status and logs

                                                    AuthenticationAuthorization based on Live ID

                                                    The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                    Web Portal

                                                    Web Service

                                                    Job registration

                                                    Job Scheduler

                                                    Job Portal

                                                    Scaling Engine

                                                    Job Registry

                                                    R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                    Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                    bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                    AzureBLAST significantly saved computing timehellip

                                                    Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                    ldquoAll against Allrdquo query bull The database is also the input query

                                                    bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                    bull Theoretically 100 billion sequence comparisons

                                                    Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                    bull Would require 3216731 minutes (61 years) on one desktop

                                                    One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                    bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                    bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                    bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                    bull Each segment consists of smaller partitions

                                                    bull When load imbalances redistribute the load manually

                                                    5

                                                    0

                                                    62 6

                                                    2 6

                                                    2 6

                                                    2 6

                                                    2 5

                                                    0 62

                                                    bull Total size of the output result is ~230GB

                                                    bull The number of total hits is 1764579487

                                                    bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                    bull But based our estimates real working instance time should be 6~8 day

                                                    bull Look into log data to analyze what took placehellip

                                                    5

                                                    0

                                                    62 6

                                                    2 6

                                                    2 6

                                                    2 6

                                                    2 5

                                                    0 62

                                                    A normal log record should be

                                                    Otherwise something is wrong (eg task failed to complete)

                                                    3312010 614 RD00155D3611B0 Executing the task 251523

                                                    3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                    3312010 625 RD00155D3611B0 Executing the task 251553

                                                    3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                    3312010 644 RD00155D3611B0 Executing the task 251600

                                                    3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                    3312010 822 RD00155D3611B0 Executing the task 251774

                                                    3312010 950 RD00155D3611B0 Executing the task 251895

                                                    3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                    North Europe Data Center totally 34256 tasks processed

                                                    All 62 compute nodes lost tasks

                                                    and then came back in a group

                                                    This is an Update domain

                                                    ~30 mins

                                                    ~ 6 nodes in one group

                                                    35 Nodes experience blob

                                                    writing failure at same

                                                    time

                                                    West Europe Datacenter 30976 tasks are completed and job was killed

                                                    A reasonable guess the

                                                    Fault Domain is working

                                                    MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                    You never miss the water till the well has run dry Irish Proverb

                                                    ET = Water volume evapotranspired (m3 s-1 m-2)

                                                    Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                    λv = Latent heat of vaporization (Jg)

                                                    Rn = Net radiation (W m-2)

                                                    cp = Specific heat capacity of air (J kg-1 K-1)

                                                    ρa = dry air density (kg m-3)

                                                    δq = vapor pressure deficit (Pa)

                                                    ga = Conductivity of air (inverse of ra) (m s-1)

                                                    gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                    γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                    Estimating resistanceconductivity across a

                                                    catchment can be tricky

                                                    bull Lots of inputs big data reduction

                                                    bull Some of the inputs are not so simple

                                                    119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                    (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                    Penman-Monteith (1964)

                                                    Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                    NASA MODIS

                                                    imagery source

                                                    archives

                                                    5 TB (600K files)

                                                    FLUXNET curated

                                                    sensor dataset

                                                    (30GB 960 files)

                                                    FLUXNET curated

                                                    field dataset

                                                    2 KB (1 file)

                                                    NCEPNCAR

                                                    ~100MB

                                                    (4K files)

                                                    Vegetative

                                                    clumping

                                                    ~5MB (1file)

                                                    Climate

                                                    classification

                                                    ~1MB (1file)

                                                    20 US year = 1 global year

                                                    Data collection (map) stage

                                                    bull Downloads requested input tiles

                                                    from NASA ftp sites

                                                    bull Includes geospatial lookup for

                                                    non-sinusoidal tiles that will

                                                    contribute to a reprojected

                                                    sinusoidal tile

                                                    Reprojection (map) stage

                                                    bull Converts source tile(s) to

                                                    intermediate result sinusoidal tiles

                                                    bull Simple nearest neighbor or spline

                                                    algorithms

                                                    Derivation reduction stage

                                                    bull First stage visible to scientist

                                                    bull Computes ET in our initial use

                                                    Analysis reduction stage

                                                    bull Optional second stage visible to

                                                    scientist

                                                    bull Enables production of science

                                                    analysis artifacts such as maps

                                                    tables virtual sensors

                                                    Reduction 1

                                                    Queue

                                                    Source

                                                    Metadata

                                                    AzureMODIS

                                                    Service Web Role Portal

                                                    Request

                                                    Queue

                                                    Scientific

                                                    Results

                                                    Download

                                                    Data Collection Stage

                                                    Source Imagery Download Sites

                                                    Reprojection

                                                    Queue

                                                    Reduction 2

                                                    Queue

                                                    Download

                                                    Queue

                                                    Scientists

                                                    Science results

                                                    Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                    httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                    bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                    bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                    bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                    recoverable units of work

                                                    bull Execution status of all jobs and tasks persisted in Tables

                                                    ltPipelineStagegt

                                                    Request

                                                    hellip ltPipelineStagegtJobStatus

                                                    Persist ltPipelineStagegtJob Queue

                                                    MODISAzure Service

                                                    (Web Role)

                                                    Service Monitor

                                                    (Worker Role)

                                                    Parse amp Persist ltPipelineStagegtTaskStatus

                                                    hellip

                                                    Dispatch

                                                    ltPipelineStagegtTask Queue

                                                    All work actually done by a Worker Role

                                                    bull Sandboxes science or other executable

                                                    bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                    Service Monitor

                                                    (Worker Role)

                                                    Parse amp Persist ltPipelineStagegtTaskStatus

                                                    GenericWorker

                                                    (Worker Role)

                                                    hellip

                                                    hellip

                                                    Dispatch

                                                    ltPipelineStagegtTask Queue

                                                    hellip

                                                    ltInputgtData Storage

                                                    bull Dequeues tasks created by the Service Monitor

                                                    bull Retries failed tasks 3 times

                                                    bull Maintains all task status

                                                    Reprojection Request

                                                    hellip

                                                    Service Monitor

                                                    (Worker Role)

                                                    ReprojectionJobStatus Persist

                                                    Parse amp Persist ReprojectionTaskStatus

                                                    GenericWorker

                                                    (Worker Role)

                                                    hellip

                                                    Job Queue

                                                    hellip

                                                    Dispatch

                                                    Task Queue

                                                    Points to

                                                    hellip

                                                    ScanTimeList

                                                    SwathGranuleMeta

                                                    Reprojection Data

                                                    Storage

                                                    Each entity specifies a

                                                    single reprojection job

                                                    request

                                                    Each entity specifies a

                                                    single reprojection task (ie

                                                    a single tile)

                                                    Query this table to get

                                                    geo-metadata (eg

                                                    boundaries) for each swath

                                                    tile

                                                    Query this table to get the

                                                    list of satellite scan times

                                                    that cover a target tile

                                                    Swath Source

                                                    Data Storage

                                                    bull Computational costs driven by data scale and need to run reduction multiple times

                                                    bull Storage costs driven by data scale and 6 month project duration

                                                    bull Small with respect to the people costs even at graduate student rates

                                                    Reduction 1 Queue

                                                    Source Metadata

                                                    Request Queue

                                                    Scientific Results Download

                                                    Data Collection Stage

                                                    Source Imagery Download Sites

                                                    Reprojection Queue

                                                    Reduction 2 Queue

                                                    Download

                                                    Queue

                                                    Scientists

                                                    Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                    400-500 GB

                                                    60K files

                                                    10 MBsec

                                                    11 hours

                                                    lt10 workers

                                                    $50 upload

                                                    $450 storage

                                                    400 GB

                                                    45K files

                                                    3500 hours

                                                    20-100

                                                    workers

                                                    5-7 GB

                                                    55K files

                                                    1800 hours

                                                    20-100

                                                    workers

                                                    lt10 GB

                                                    ~1K files

                                                    1800 hours

                                                    20-100

                                                    workers

                                                    $420 cpu

                                                    $60 download

                                                    $216 cpu

                                                    $1 download

                                                    $6 storage

                                                    $216 cpu

                                                    $2 download

                                                    $9 storage

                                                    AzureMODIS

                                                    Service Web Role Portal

                                                    Total $1420

                                                    bull Clouds are the largest scale computer centers ever constructed and have the

                                                    potential to be important to both large and small scale science problems

                                                    bull Equally import they can increase participation in research providing needed

                                                    resources to userscommunities without ready access

                                                    bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                    support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                    latency applications do not perform optimally on clouds today

                                                    bull Provide valuable fault tolerance and scalability abstractions

                                                    bull Clouds as amplifier for familiar client tools and on premise compute

                                                    bull Clouds services to support research provide considerable leverage for both

                                                    individual researchers and entire communities of researchers

                                                    • Day 2 - Azure as a PaaS
                                                    • Day 2 - Applications

                                                      Durable Storage At Massive Scale

                                                      Blob

                                                      - Massive files eg videos logs

                                                      Drive

                                                      - Use standard file system APIs

                                                      Tables - Non-relational but with few scale limits

                                                      - Use SQL Azure for relational data

                                                      Queues

                                                      - Facilitate loosely-coupled reliable systems

                                                      Blob Features and Functions

                                                      bull Store Large Objects (up to 1TB in size)

                                                      bull Can be served through Windows Azure CDN service

                                                      bull Standard REST Interface

                                                      bull PutBlob

                                                      bull Inserts a new blob overwrites the existing blob

                                                      bull GetBlob

                                                      bull Get whole blob or a specific range

                                                      bull DeleteBlob

                                                      bull CopyBlob

                                                      bull SnapshotBlob

                                                      bull LeaseBlob

                                                      Two Types of Blobs Under the Hood

                                                      bull Block Blob

                                                      bull Targeted at streaming workloads

                                                      bull Each blob consists of a sequence of blocks bull Each block is identified by a Block ID

                                                      bull Size limit 200GB per blob

                                                      bull Page Blob

                                                      bull Targeted at random readwrite workloads

                                                      bull Each blob consists of an array of pages bull Each page is identified by its offset

                                                      from the start of the blob

                                                      bull Size limit 1TB per blob

                                                      Windows Azure Drive

                                                      bull Provides a durable NTFS volume for Windows Azure applications to use

                                                      bull Use existing NTFS APIs to access a durable drive bull Durability and survival of data on application failover

                                                      bull Enables migrating existing NTFS applications to the cloud

                                                      bull A Windows Azure Drive is a Page Blob

                                                      bull Example mount Page Blob as X bull httpltaccountnamegtblobcorewindowsnetltcontainernamegtltblobnamegt

                                                      bull All writes to drive are made durable to the Page Blob bull Drive made durable through standard Page Blob replication

                                                      bull Drive persists even when not mounted as a Page Blob

                                                      Windows Azure Tables

                                                      bull Provides Structured Storage

                                                      bull Massively Scalable Tables bull Billions of entities (rows) and TBs of data

                                                      bull Can use thousands of servers as traffic grows

                                                      bull Highly Available amp Durable bull Data is replicated several times

                                                      bull Familiar and Easy to use API

                                                      bull WCF Data Services and OData bull NET classes and LINQ

                                                      bull REST ndash with any platform or language

                                                      Windows Azure Queues

                                                      bull Queue are performance efficient highly available and provide reliable message delivery

                                                      bull Simple asynchronous work dispatch

                                                      bull Programming semantics ensure that a message can be processed at least once

                                                      bull Access is provided via REST

                                                      Storage Partitioning

                                                      Understanding partitioning is key to understanding performance

                                                      bull Different for each data type (blobs entities queues) Every data object has a partition key

                                                      bull A partition can be served by a single server

                                                      bull System load balances partitions based on traffic pattern

                                                      bull Controls entity locality Partition key is unit of scale

                                                      bull Load balancing can take a few minutes to kick in

                                                      bull Can take a couple of seconds for partition to be available on a different server

                                                      System load balances

                                                      bull Use exponential backoff on ldquoServer Busyrdquo

                                                      bull Our system load balances to meet your traffic needs

                                                      bull Single partition limits have been reached Server Busy

                                                      Partition Keys In Each Abstraction

                                                      bull Entities w same PartitionKey value served from same partition Entities ndash TableName + PartitionKey

                                                      PartitionKey (CustomerId) RowKey (RowKind)

                                                      Name CreditCardNumber OrderTotal

                                                      1 Customer-John Smith John Smith xxxx-xxxx-xxxx-xxxx

                                                      1 Order ndash 1 $3512

                                                      2 Customer-Bill Johnson Bill Johnson xxxx-xxxx-xxxx-xxxx

                                                      2 Order ndash 3 $1000

                                                      bull Every blob and its snapshots are in a single partition Blobs ndash Container name + Blob name

                                                      bullAll messages for a single queue belong to the same partition Messages ndash Queue Name

                                                      Container Name Blob Name

                                                      image annarborbighousejpg

                                                      image foxboroughgillettejpg

                                                      video annarborbighousejpg

                                                      Queue Message

                                                      jobs Message1

                                                      jobs Message2

                                                      workflow Message1

                                                      Scalability Targets

                                                      Storage Account

                                                      bull Capacity ndash Up to 100 TBs

                                                      bull Transactions ndash Up to a few thousand requests per second

                                                      bull Bandwidth ndash Up to a few hundred megabytes per second

                                                      Single QueueTable Partition

                                                      bull Up to 500 transactions per second

                                                      To go above these numbers partition between multiple storage accounts and partitions

                                                      When limit is hit app will see lsquo503 server busyrsquo applications should implement exponential backoff

                                                      Single Blob Partition

                                                      bull Throughput up to 60 MBs

                                                      PartitionKey (Category)

                                                      RowKey (Title)

                                                      Timestamp ReleaseDate

                                                      Action Fast amp Furious hellip 2009

                                                      Action The Bourne Ultimatum hellip 2007

                                                      hellip hellip hellip hellip

                                                      Animation Open Season 2 hellip 2009

                                                      Animation The Ant Bully hellip 2006

                                                      PartitionKey (Category)

                                                      RowKey (Title)

                                                      Timestamp ReleaseDate

                                                      Comedy Office Space hellip 1999

                                                      hellip hellip hellip hellip

                                                      SciFi X-Men Origins Wolverine hellip 2009

                                                      hellip hellip hellip hellip

                                                      War Defiance hellip 2008

                                                      PartitionKey (Category)

                                                      RowKey (Title)

                                                      Timestamp ReleaseDate

                                                      Action Fast amp Furious hellip 2009

                                                      Action The Bourne Ultimatum hellip 2007

                                                      hellip hellip hellip hellip

                                                      Animation Open Season 2 hellip 2009

                                                      Animation The Ant Bully hellip 2006

                                                      hellip hellip hellip hellip

                                                      Comedy Office Space hellip 1999

                                                      hellip hellip hellip hellip

                                                      SciFi X-Men Origins Wolverine hellip 2009

                                                      hellip hellip hellip hellip

                                                      War Defiance hellip 2008

                                                      Partitions and Partition Ranges

                                                      Key Selection Things to Consider

                                                      bullDistribute load as much as possible bullHot partitions can be load balanced bullPartitionKey is critical for scalability

                                                      See httpwwwmicrosoftpdccom2009SVC09 and httpazurescopecloudappnet for more information

                                                      bull Avoid frequent large scans bull Parallelize queries bull Point queries are most efficient

                                                      bullTransactions across a single partition bullTransaction semantics amp Reduce round trips

                                                      Scalability

                                                      Query Efficiency amp Speed

                                                      Entity group transactions

                                                      Expect Continuation Tokens ndash Seriously

                                                      Maximum of 1000 rows in a response

                                                      At the end of partition range boundary

                                                      Maximum of 1000 rows in a response

                                                      At the end of partition range boundary

                                                      Maximum of 5 seconds to execute the query

                                                      Tables Recap bullEfficient for frequently used queries

                                                      bullSupports batch transactions

                                                      bullDistributes load

                                                      Select PartitionKey and RowKey that help scale

                                                      Avoid ldquoAppend onlyrdquo patterns

                                                      Always Handle continuation tokens

                                                      ldquoORrdquo predicates are not optimized

                                                      Implement back-off strategy for retries

                                                      bullDistribute by using a hash etc as prefix

                                                      bullExpect continuation tokens for range queries

                                                      bullExecute the queries that form the ldquoORrdquo predicates as separate queries

                                                      bullServer busy

                                                      bullLoad balance partitions to meet traffic needs

                                                      bullLoad on single partition has exceeded the limits

                                                      WCF Data Services

                                                      bullUse a new context for each logical operation

                                                      bullAddObjectAttachTo can throw exception if entity is already being tracked

                                                      bullPoint query throws an exception if resource does not exist Use IgnoreResourceNotFoundException

                                                      Queues Their Unique Role in Building Reliable Scalable Applications

                                                      bull Want roles that work closely together but are not bound together bull Tight coupling leads to brittleness

                                                      bull This can aid in scaling and performance

                                                      bull A queue can hold an unlimited number of messages bull Messages must be serializable as XML

                                                      bull Limited to 8KB in size

                                                      bull Commonly use the work ticket pattern

                                                      bull Why not simply use a table

                                                      Queue Terminology

                                                      Message Lifecycle

                                                      Queue

                                                      Msg 1

                                                      Msg 2

                                                      Msg 3

                                                      Msg 4

                                                      Worker Role

                                                      Worker Role

                                                      PutMessage

                                                      Web Role

                                                      GetMessage (Timeout) RemoveMessage

                                                      Msg 2 Msg 1

                                                      Worker Role

                                                      Msg 2

                                                      POST httpmyaccountqueuecorewindowsnetmyqueuemessages

                                                      HTTP11 200 OK Transfer-Encoding chunked Content-Type applicationxml Date Tue 09 Dec 2008 210430 GMT Server Nephos Queue Service Version 10 Microsoft-HTTPAPI20

                                                      ltxml version=10 encoding=utf-8gt ltQueueMessagesListgt ltQueueMessagegt ltMessageIdgt5974b586-0df3-4e2d-ad0c-18e3892bfca2ltMessageIdgt ltInsertionTimegtMon 22 Sep 2008 232920 GMTltInsertionTimegt ltExpirationTimegtMon 29 Sep 2008 232920 GMTltExpirationTimegt ltPopReceiptgtYzQ4Yzg1MDIGM0MDFiZDAwYzEwltPopReceiptgt ltTimeNextVisiblegtTue 23 Sep 2008 052920GMTltTimeNextVisiblegt ltMessageTextgtPHRlc3Q+dGdGVzdD4=ltMessageTextgt ltQueueMessagegt ltQueueMessagesListgt

                                                      DELETE httpmyaccountqueuecorewindowsnetmyqueuemessagesmessageidpopreceipt=YzQ4Yzg1MDIGM0MDFiZDAwYzEw

                                                      Truncated Exponential Back Off Polling

                                                      Consider a backoff polling approach Each empty poll

                                                      increases interval by 2x

                                                      A successful sets the interval back to 1

                                                      44

                                                      2 1

                                                      1 1

                                                      C1

                                                      C2

                                                      Removing Poison Messages

                                                      1 1

                                                      2 1

                                                      3 4 0

                                                      Producers Consumers

                                                      P2

                                                      P1

                                                      3 0

                                                      2 GetMessage(Q 30 s) msg 2

                                                      1 GetMessage(Q 30 s) msg 1

                                                      1 1

                                                      2 1

                                                      1 0

                                                      2 0

                                                      45

                                                      C1

                                                      C2

                                                      Removing Poison Messages

                                                      3 4 0

                                                      Producers Consumers

                                                      P2

                                                      P1

                                                      1 1

                                                      2 1

                                                      2 GetMessage(Q 30 s) msg 2 3 C2 consumed msg 2 4 DeleteMessage(Q msg 2) 7 GetMessage(Q 30 s) msg 1

                                                      1 GetMessage(Q 30 s) msg 1 5 C1 crashed

                                                      1 1

                                                      2 1

                                                      6 msg1 visible 30 s after Dequeue 3 0

                                                      1 2

                                                      1 1

                                                      1 2

                                                      46

                                                      C1

                                                      C2

                                                      Removing Poison Messages

                                                      3 4 0

                                                      Producers Consumers

                                                      P2

                                                      P1

                                                      1 2

                                                      2 Dequeue(Q 30 sec) msg 2 3 C2 consumed msg 2 4 Delete(Q msg 2) 7 Dequeue(Q 30 sec) msg 1 8 C2 crashed

                                                      1 Dequeue(Q 30 sec) msg 1 5 C1 crashed 10 C1 restarted 11 Dequeue(Q 30 sec) msg 1 12 DequeueCount gt 2 13 Delete (Q msg1) 1

                                                      2

                                                      6 msg1 visible 30s after Dequeue 9 msg1 visible 30s after Dequeue

                                                      3 0

                                                      1 3

                                                      1 2

                                                      1 3

                                                      Queues Recap

                                                      bullNo need to deal with failures Make message

                                                      processing idempotent

                                                      bullInvisible messages result in out of order Do not rely on order

                                                      bullEnforce threshold on messagersquos dequeue count Use Dequeue count to remove

                                                      poison messages

                                                      bullMessages gt 8KB

                                                      bullBatch messages

                                                      bullGarbage collect orphaned blobs

                                                      bullDynamically increasereduce workers

                                                      Use blob to store message data with

                                                      reference in message

                                                      Use message count to scale

                                                      bullNo need to deal with failures

                                                      bullInvisible messages result in out of order

                                                      bullEnforce threshold on messagersquos dequeue count

                                                      bullDynamically increasereduce workers

                                                      Windows Azure Storage Takeaways

                                                      Blobs

                                                      Drives

                                                      Tables

                                                      Queues

                                                      httpblogsmsdncomwindowsazurestorage

                                                      httpazurescopecloudappnet

                                                      49

                                                      A Quick Exercise

                                                      hellipThen letrsquos look at some code and some tools

                                                      50

                                                      Code ndash AccountInformationcs public class AccountInformation private static string storageKey = ldquotHiSiSnOtMyKeY private static string accountName = jjstore private static StorageCredentialsAccountAndKey credentials internal static StorageCredentialsAccountAndKey Credentials get if (credentials == null) credentials = new StorageCredentialsAccountAndKey(accountName storageKey) return credentials

                                                      51

                                                      Code ndash BlobHelpercs public class BlobHelper private static string defaultContainerName = school private CloudBlobClient client = null private CloudBlobContainer container = null private void InitContainer() if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudBlobClient() container = clientGetContainerReference(defaultContainerName) containerCreateIfNotExist() BlobContainerPermissions permissions = containerGetPermissions() permissionsPublicAccess = BlobContainerPublicAccessTypeContainer containerSetPermissions(permissions)

                                                      52

                                                      Code ndash BlobHelpercs

                                                      public void WriteFileToBlob(string filePath) if (client == null || container == null) InitContainer() FileInfo file = new FileInfo(filePath) CloudBlob blob = containerGetBlobReference(fileName) blobPropertiesContentType = GetContentType(fileExtension) blobUploadFile(fileFullName) Or if you want to write a string replace the last line with blobUploadText(someString) And make sure you set the content type to the appropriate MIME type (eg ldquotextplainrdquo)

                                                      53

                                                      Code ndash BlobHelpercs

                                                      public string GetBlobText(string blobName) if (client == null || container == null) InitContainer() CloudBlob blob = containerGetBlobReference(blobName) try return blobDownloadText() catch (Exception) The blob probably does not exist or there is no connection available return null

                                                      54

                                                      Application Code - Blobs private void SaveToCloudButton_Click(object sender RoutedEventArgs e) StringBuilder buff = new StringBuilder() buffAppendLine(LastNameFirstNameEmailBirthdayNativeLanguageFavoriteIceCreamYearsInPhDGraduated) foreach (AttendeeEntity attendee in attendees) buffAppendLine(attendeeToCsvString()) blobHelperWriteStringToBlob(SummerSchoolAttendeestxt buffToString())

                                                      The blob is now available at httpltAccountNamegtblobcorewindowsnetltContainerNamegtltBlobNamegt Or in this case httpjjstoreblobcorewindowsnetschoolSummerSchoolAttendeestxt

                                                      55

                                                      Code - TableEntities using MicrosoftWindowsAzureStorageClient public class AttendeeEntity TableServiceEntity public string FirstName get set public string LastName get set public string Email get set public DateTime Birthday get set public string FavoriteIceCream get set public int YearsInPhD get set public bool Graduated get set hellip

                                                      56

                                                      Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

                                                      57

                                                      Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

                                                      58

                                                      Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

                                                      59

                                                      Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

                                                      60

                                                      Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

                                                      Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

                                                      61

                                                      Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

                                                      62

                                                      Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

                                                      Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

                                                      63

                                                      Tools ndash Fiddler2

                                                      Best Practices

                                                      Picking the Right VM Size

                                                      bull Having the correct VM size can make a big difference in costs

                                                      bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                                                      bull If you scale better than linear across cores larger VMs could save you money

                                                      bull Pretty rare to see linear scaling across 8 cores

                                                      bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                                                      bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                                                      Using Your VM to the Maximum

                                                      Remember bull 1 role instance == 1 VM running Windows

                                                      bull 1 role instance = one specific task for your code

                                                      bull Yoursquore paying for the entire VM so why not use it

                                                      bull Common mistake ndash split up code into multiple roles each not using up CPU

                                                      bull Balance between using up CPU vs having free capacity in times of need

                                                      bull Multiple ways to use your CPU to the fullest

                                                      Exploiting Concurrency

                                                      bull Spin up additional processes each with a specific task or as a unit of concurrency

                                                      bull May not be ideal if number of active processes exceeds number of cores

                                                      bull Use multithreading aggressively

                                                      bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                                                      bull In NET 4 use the Task Parallel Library

                                                      bull Data parallelism

                                                      bull Task parallelism

                                                      Finding Good Code Neighbors

                                                      bull Typically code falls into one or more of these categories

                                                      bull Find code that is intensive with different resources to live together

                                                      bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                                                      Memory Intensive

                                                      CPU Intensive

                                                      Network IO Intensive

                                                      Storage IO Intensive

                                                      Scaling Appropriately

                                                      bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                                                      bull Spinning VMs up and down automatically is good at large scale

                                                      bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                                                      bull Being too aggressive in spinning down VMs can result in poor user experience

                                                      bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                                                      Performance Cost

                                                      Storage Costs

                                                      bull Understand an applicationrsquos storage profile and how storage billing works

                                                      bull Make service choices based on your app profile

                                                      bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                                                      bull Service choice can make a big cost difference based on your app profile

                                                      bull Caching and compressing They help a lot with storage costs

                                                      Saving Bandwidth Costs

                                                      Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                                      Sending fewer things over the wire often means getting fewer things from storage

                                                      Saving bandwidth costs often lead to savings in other places

                                                      Sending fewer things means your VM has time to do other tasks

                                                      All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                                      Compressing Content

                                                      1 Gzip all output content

                                                      bull All modern browsers can decompress on the fly

                                                      bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                                      2Tradeoff compute costs for storage size

                                                      3Minimize image sizes

                                                      bull Use Portable Network Graphics (PNGs)

                                                      bull Crush your PNGs

                                                      bull Strip needless metadata

                                                      bull Make all PNGs palette PNGs

                                                      Uncompressed

                                                      Content

                                                      Compressed

                                                      Content

                                                      Gzip

                                                      Minify JavaScript

                                                      Minify CCS

                                                      Minify Images

                                                      Best Practices Summary

                                                      Doing lsquolessrsquo is the key to saving costs

                                                      Measure everything

                                                      Know your application profile in and out

                                                      Research Examples in the Cloud

                                                      hellipon another set of slides

                                                      Map Reduce on Azure

                                                      bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                                      bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                                      76

                                                      Project Daytona - Map Reduce on Azure

                                                      httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                                      77

                                                      Questions and Discussionhellip

                                                      Thank you for hosting me at the Summer School

                                                      BLAST (Basic Local Alignment Search Tool)

                                                      bull The most important software in bioinformatics

                                                      bull Identify similarity between bio-sequences

                                                      Computationally intensive

                                                      bull Large number of pairwise alignment operations

                                                      bull A BLAST running can take 700 ~ 1000 CPU hours

                                                      bull Sequence databases growing exponentially

                                                      bull GenBank doubled in size in about 15 months

                                                      It is easy to parallelize BLAST

                                                      bull Segment the input

                                                      bull Segment processing (querying) is pleasingly parallel

                                                      bull Segment the database (eg mpiBLAST)

                                                      bull Needs special result reduction processing

                                                      Large volume data

                                                      bull A normal Blast database can be as large as 10GB

                                                      bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                      bull The output of BLAST is usually 10-100x larger than the input

                                                      bull Parallel BLAST engine on Azure

                                                      bull Query-segmentation data-parallel pattern bull split the input sequences

                                                      bull query partitions in parallel

                                                      bull merge results together when done

                                                      bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                      bull With three special considerations bull Batch job management

                                                      bull Task parallelism on an elastic Cloud

                                                      Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                      on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                      A simple SplitJoin pattern

                                                      Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                      bull 1248 for small middle large and extra large instance size

                                                      Task granularity bull Large partition load imbalance

                                                      bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                      bull Data transferring overhead

                                                      Best Practice test runs to profiling and set size to mitigate the overhead

                                                      Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                      bull too small repeated computation

                                                      bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                      bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                      bull Watch out for the 2-hour maximum limitation

                                                      BLAST task

                                                      Splitting task

                                                      BLAST task

                                                      BLAST task

                                                      BLAST task

                                                      hellip

                                                      Merging Task

                                                      Task size vs Performance

                                                      bull Benefit of the warm cache effect

                                                      bull 100 sequences per partition is the best choice

                                                      Instance size vs Performance

                                                      bull Super-linear speedup with larger size worker instances

                                                      bull Primarily due to the memory capability

                                                      Task SizeInstance Size vs Cost

                                                      bull Extra-large instance generated the best and the most economical throughput

                                                      bull Fully utilize the resource

                                                      Web

                                                      Portal

                                                      Web

                                                      Service

                                                      Job registration

                                                      Job Scheduler

                                                      Worker

                                                      Worker

                                                      Worker

                                                      Global

                                                      dispatch

                                                      queue

                                                      Web Role

                                                      Azure Table

                                                      Job Management Role

                                                      Azure Blob

                                                      Database

                                                      updating Role

                                                      hellip

                                                      Scaling Engine

                                                      Blast databases

                                                      temporary data etc)

                                                      Job Registry NCBI databases

                                                      BLAST task

                                                      Splitting task

                                                      BLAST task

                                                      BLAST task

                                                      BLAST task

                                                      hellip

                                                      Merging Task

                                                      ASPNET program hosted by a web role instance bull Submit jobs

                                                      bull Track jobrsquos status and logs

                                                      AuthenticationAuthorization based on Live ID

                                                      The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                      Web Portal

                                                      Web Service

                                                      Job registration

                                                      Job Scheduler

                                                      Job Portal

                                                      Scaling Engine

                                                      Job Registry

                                                      R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                      Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                      bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                      AzureBLAST significantly saved computing timehellip

                                                      Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                      ldquoAll against Allrdquo query bull The database is also the input query

                                                      bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                      bull Theoretically 100 billion sequence comparisons

                                                      Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                      bull Would require 3216731 minutes (61 years) on one desktop

                                                      One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                      bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                      bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                      bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                      bull Each segment consists of smaller partitions

                                                      bull When load imbalances redistribute the load manually

                                                      5

                                                      0

                                                      62 6

                                                      2 6

                                                      2 6

                                                      2 6

                                                      2 5

                                                      0 62

                                                      bull Total size of the output result is ~230GB

                                                      bull The number of total hits is 1764579487

                                                      bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                      bull But based our estimates real working instance time should be 6~8 day

                                                      bull Look into log data to analyze what took placehellip

                                                      5

                                                      0

                                                      62 6

                                                      2 6

                                                      2 6

                                                      2 6

                                                      2 5

                                                      0 62

                                                      A normal log record should be

                                                      Otherwise something is wrong (eg task failed to complete)

                                                      3312010 614 RD00155D3611B0 Executing the task 251523

                                                      3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                      3312010 625 RD00155D3611B0 Executing the task 251553

                                                      3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                      3312010 644 RD00155D3611B0 Executing the task 251600

                                                      3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                      3312010 822 RD00155D3611B0 Executing the task 251774

                                                      3312010 950 RD00155D3611B0 Executing the task 251895

                                                      3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                      North Europe Data Center totally 34256 tasks processed

                                                      All 62 compute nodes lost tasks

                                                      and then came back in a group

                                                      This is an Update domain

                                                      ~30 mins

                                                      ~ 6 nodes in one group

                                                      35 Nodes experience blob

                                                      writing failure at same

                                                      time

                                                      West Europe Datacenter 30976 tasks are completed and job was killed

                                                      A reasonable guess the

                                                      Fault Domain is working

                                                      MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                      You never miss the water till the well has run dry Irish Proverb

                                                      ET = Water volume evapotranspired (m3 s-1 m-2)

                                                      Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                      λv = Latent heat of vaporization (Jg)

                                                      Rn = Net radiation (W m-2)

                                                      cp = Specific heat capacity of air (J kg-1 K-1)

                                                      ρa = dry air density (kg m-3)

                                                      δq = vapor pressure deficit (Pa)

                                                      ga = Conductivity of air (inverse of ra) (m s-1)

                                                      gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                      γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                      Estimating resistanceconductivity across a

                                                      catchment can be tricky

                                                      bull Lots of inputs big data reduction

                                                      bull Some of the inputs are not so simple

                                                      119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                      (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                      Penman-Monteith (1964)

                                                      Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                      NASA MODIS

                                                      imagery source

                                                      archives

                                                      5 TB (600K files)

                                                      FLUXNET curated

                                                      sensor dataset

                                                      (30GB 960 files)

                                                      FLUXNET curated

                                                      field dataset

                                                      2 KB (1 file)

                                                      NCEPNCAR

                                                      ~100MB

                                                      (4K files)

                                                      Vegetative

                                                      clumping

                                                      ~5MB (1file)

                                                      Climate

                                                      classification

                                                      ~1MB (1file)

                                                      20 US year = 1 global year

                                                      Data collection (map) stage

                                                      bull Downloads requested input tiles

                                                      from NASA ftp sites

                                                      bull Includes geospatial lookup for

                                                      non-sinusoidal tiles that will

                                                      contribute to a reprojected

                                                      sinusoidal tile

                                                      Reprojection (map) stage

                                                      bull Converts source tile(s) to

                                                      intermediate result sinusoidal tiles

                                                      bull Simple nearest neighbor or spline

                                                      algorithms

                                                      Derivation reduction stage

                                                      bull First stage visible to scientist

                                                      bull Computes ET in our initial use

                                                      Analysis reduction stage

                                                      bull Optional second stage visible to

                                                      scientist

                                                      bull Enables production of science

                                                      analysis artifacts such as maps

                                                      tables virtual sensors

                                                      Reduction 1

                                                      Queue

                                                      Source

                                                      Metadata

                                                      AzureMODIS

                                                      Service Web Role Portal

                                                      Request

                                                      Queue

                                                      Scientific

                                                      Results

                                                      Download

                                                      Data Collection Stage

                                                      Source Imagery Download Sites

                                                      Reprojection

                                                      Queue

                                                      Reduction 2

                                                      Queue

                                                      Download

                                                      Queue

                                                      Scientists

                                                      Science results

                                                      Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                      httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                      bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                      bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                      bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                      recoverable units of work

                                                      bull Execution status of all jobs and tasks persisted in Tables

                                                      ltPipelineStagegt

                                                      Request

                                                      hellip ltPipelineStagegtJobStatus

                                                      Persist ltPipelineStagegtJob Queue

                                                      MODISAzure Service

                                                      (Web Role)

                                                      Service Monitor

                                                      (Worker Role)

                                                      Parse amp Persist ltPipelineStagegtTaskStatus

                                                      hellip

                                                      Dispatch

                                                      ltPipelineStagegtTask Queue

                                                      All work actually done by a Worker Role

                                                      bull Sandboxes science or other executable

                                                      bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                      Service Monitor

                                                      (Worker Role)

                                                      Parse amp Persist ltPipelineStagegtTaskStatus

                                                      GenericWorker

                                                      (Worker Role)

                                                      hellip

                                                      hellip

                                                      Dispatch

                                                      ltPipelineStagegtTask Queue

                                                      hellip

                                                      ltInputgtData Storage

                                                      bull Dequeues tasks created by the Service Monitor

                                                      bull Retries failed tasks 3 times

                                                      bull Maintains all task status

                                                      Reprojection Request

                                                      hellip

                                                      Service Monitor

                                                      (Worker Role)

                                                      ReprojectionJobStatus Persist

                                                      Parse amp Persist ReprojectionTaskStatus

                                                      GenericWorker

                                                      (Worker Role)

                                                      hellip

                                                      Job Queue

                                                      hellip

                                                      Dispatch

                                                      Task Queue

                                                      Points to

                                                      hellip

                                                      ScanTimeList

                                                      SwathGranuleMeta

                                                      Reprojection Data

                                                      Storage

                                                      Each entity specifies a

                                                      single reprojection job

                                                      request

                                                      Each entity specifies a

                                                      single reprojection task (ie

                                                      a single tile)

                                                      Query this table to get

                                                      geo-metadata (eg

                                                      boundaries) for each swath

                                                      tile

                                                      Query this table to get the

                                                      list of satellite scan times

                                                      that cover a target tile

                                                      Swath Source

                                                      Data Storage

                                                      bull Computational costs driven by data scale and need to run reduction multiple times

                                                      bull Storage costs driven by data scale and 6 month project duration

                                                      bull Small with respect to the people costs even at graduate student rates

                                                      Reduction 1 Queue

                                                      Source Metadata

                                                      Request Queue

                                                      Scientific Results Download

                                                      Data Collection Stage

                                                      Source Imagery Download Sites

                                                      Reprojection Queue

                                                      Reduction 2 Queue

                                                      Download

                                                      Queue

                                                      Scientists

                                                      Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                      400-500 GB

                                                      60K files

                                                      10 MBsec

                                                      11 hours

                                                      lt10 workers

                                                      $50 upload

                                                      $450 storage

                                                      400 GB

                                                      45K files

                                                      3500 hours

                                                      20-100

                                                      workers

                                                      5-7 GB

                                                      55K files

                                                      1800 hours

                                                      20-100

                                                      workers

                                                      lt10 GB

                                                      ~1K files

                                                      1800 hours

                                                      20-100

                                                      workers

                                                      $420 cpu

                                                      $60 download

                                                      $216 cpu

                                                      $1 download

                                                      $6 storage

                                                      $216 cpu

                                                      $2 download

                                                      $9 storage

                                                      AzureMODIS

                                                      Service Web Role Portal

                                                      Total $1420

                                                      bull Clouds are the largest scale computer centers ever constructed and have the

                                                      potential to be important to both large and small scale science problems

                                                      bull Equally import they can increase participation in research providing needed

                                                      resources to userscommunities without ready access

                                                      bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                      support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                      latency applications do not perform optimally on clouds today

                                                      bull Provide valuable fault tolerance and scalability abstractions

                                                      bull Clouds as amplifier for familiar client tools and on premise compute

                                                      bull Clouds services to support research provide considerable leverage for both

                                                      individual researchers and entire communities of researchers

                                                      • Day 2 - Azure as a PaaS
                                                      • Day 2 - Applications

                                                        Blob Features and Functions

                                                        bull Store Large Objects (up to 1TB in size)

                                                        bull Can be served through Windows Azure CDN service

                                                        bull Standard REST Interface

                                                        bull PutBlob

                                                        bull Inserts a new blob overwrites the existing blob

                                                        bull GetBlob

                                                        bull Get whole blob or a specific range

                                                        bull DeleteBlob

                                                        bull CopyBlob

                                                        bull SnapshotBlob

                                                        bull LeaseBlob

                                                        Two Types of Blobs Under the Hood

                                                        bull Block Blob

                                                        bull Targeted at streaming workloads

                                                        bull Each blob consists of a sequence of blocks bull Each block is identified by a Block ID

                                                        bull Size limit 200GB per blob

                                                        bull Page Blob

                                                        bull Targeted at random readwrite workloads

                                                        bull Each blob consists of an array of pages bull Each page is identified by its offset

                                                        from the start of the blob

                                                        bull Size limit 1TB per blob

                                                        Windows Azure Drive

                                                        bull Provides a durable NTFS volume for Windows Azure applications to use

                                                        bull Use existing NTFS APIs to access a durable drive bull Durability and survival of data on application failover

                                                        bull Enables migrating existing NTFS applications to the cloud

                                                        bull A Windows Azure Drive is a Page Blob

                                                        bull Example mount Page Blob as X bull httpltaccountnamegtblobcorewindowsnetltcontainernamegtltblobnamegt

                                                        bull All writes to drive are made durable to the Page Blob bull Drive made durable through standard Page Blob replication

                                                        bull Drive persists even when not mounted as a Page Blob

                                                        Windows Azure Tables

                                                        bull Provides Structured Storage

                                                        bull Massively Scalable Tables bull Billions of entities (rows) and TBs of data

                                                        bull Can use thousands of servers as traffic grows

                                                        bull Highly Available amp Durable bull Data is replicated several times

                                                        bull Familiar and Easy to use API

                                                        bull WCF Data Services and OData bull NET classes and LINQ

                                                        bull REST ndash with any platform or language

                                                        Windows Azure Queues

                                                        bull Queue are performance efficient highly available and provide reliable message delivery

                                                        bull Simple asynchronous work dispatch

                                                        bull Programming semantics ensure that a message can be processed at least once

                                                        bull Access is provided via REST

                                                        Storage Partitioning

                                                        Understanding partitioning is key to understanding performance

                                                        bull Different for each data type (blobs entities queues) Every data object has a partition key

                                                        bull A partition can be served by a single server

                                                        bull System load balances partitions based on traffic pattern

                                                        bull Controls entity locality Partition key is unit of scale

                                                        bull Load balancing can take a few minutes to kick in

                                                        bull Can take a couple of seconds for partition to be available on a different server

                                                        System load balances

                                                        bull Use exponential backoff on ldquoServer Busyrdquo

                                                        bull Our system load balances to meet your traffic needs

                                                        bull Single partition limits have been reached Server Busy

                                                        Partition Keys In Each Abstraction

                                                        bull Entities w same PartitionKey value served from same partition Entities ndash TableName + PartitionKey

                                                        PartitionKey (CustomerId) RowKey (RowKind)

                                                        Name CreditCardNumber OrderTotal

                                                        1 Customer-John Smith John Smith xxxx-xxxx-xxxx-xxxx

                                                        1 Order ndash 1 $3512

                                                        2 Customer-Bill Johnson Bill Johnson xxxx-xxxx-xxxx-xxxx

                                                        2 Order ndash 3 $1000

                                                        bull Every blob and its snapshots are in a single partition Blobs ndash Container name + Blob name

                                                        bullAll messages for a single queue belong to the same partition Messages ndash Queue Name

                                                        Container Name Blob Name

                                                        image annarborbighousejpg

                                                        image foxboroughgillettejpg

                                                        video annarborbighousejpg

                                                        Queue Message

                                                        jobs Message1

                                                        jobs Message2

                                                        workflow Message1

                                                        Scalability Targets

                                                        Storage Account

                                                        bull Capacity ndash Up to 100 TBs

                                                        bull Transactions ndash Up to a few thousand requests per second

                                                        bull Bandwidth ndash Up to a few hundred megabytes per second

                                                        Single QueueTable Partition

                                                        bull Up to 500 transactions per second

                                                        To go above these numbers partition between multiple storage accounts and partitions

                                                        When limit is hit app will see lsquo503 server busyrsquo applications should implement exponential backoff

                                                        Single Blob Partition

                                                        bull Throughput up to 60 MBs

                                                        PartitionKey (Category)

                                                        RowKey (Title)

                                                        Timestamp ReleaseDate

                                                        Action Fast amp Furious hellip 2009

                                                        Action The Bourne Ultimatum hellip 2007

                                                        hellip hellip hellip hellip

                                                        Animation Open Season 2 hellip 2009

                                                        Animation The Ant Bully hellip 2006

                                                        PartitionKey (Category)

                                                        RowKey (Title)

                                                        Timestamp ReleaseDate

                                                        Comedy Office Space hellip 1999

                                                        hellip hellip hellip hellip

                                                        SciFi X-Men Origins Wolverine hellip 2009

                                                        hellip hellip hellip hellip

                                                        War Defiance hellip 2008

                                                        PartitionKey (Category)

                                                        RowKey (Title)

                                                        Timestamp ReleaseDate

                                                        Action Fast amp Furious hellip 2009

                                                        Action The Bourne Ultimatum hellip 2007

                                                        hellip hellip hellip hellip

                                                        Animation Open Season 2 hellip 2009

                                                        Animation The Ant Bully hellip 2006

                                                        hellip hellip hellip hellip

                                                        Comedy Office Space hellip 1999

                                                        hellip hellip hellip hellip

                                                        SciFi X-Men Origins Wolverine hellip 2009

                                                        hellip hellip hellip hellip

                                                        War Defiance hellip 2008

                                                        Partitions and Partition Ranges

                                                        Key Selection Things to Consider

                                                        bullDistribute load as much as possible bullHot partitions can be load balanced bullPartitionKey is critical for scalability

                                                        See httpwwwmicrosoftpdccom2009SVC09 and httpazurescopecloudappnet for more information

                                                        bull Avoid frequent large scans bull Parallelize queries bull Point queries are most efficient

                                                        bullTransactions across a single partition bullTransaction semantics amp Reduce round trips

                                                        Scalability

                                                        Query Efficiency amp Speed

                                                        Entity group transactions

                                                        Expect Continuation Tokens ndash Seriously

                                                        Maximum of 1000 rows in a response

                                                        At the end of partition range boundary

                                                        Maximum of 1000 rows in a response

                                                        At the end of partition range boundary

                                                        Maximum of 5 seconds to execute the query

                                                        Tables Recap bullEfficient for frequently used queries

                                                        bullSupports batch transactions

                                                        bullDistributes load

                                                        Select PartitionKey and RowKey that help scale

                                                        Avoid ldquoAppend onlyrdquo patterns

                                                        Always Handle continuation tokens

                                                        ldquoORrdquo predicates are not optimized

                                                        Implement back-off strategy for retries

                                                        bullDistribute by using a hash etc as prefix

                                                        bullExpect continuation tokens for range queries

                                                        bullExecute the queries that form the ldquoORrdquo predicates as separate queries

                                                        bullServer busy

                                                        bullLoad balance partitions to meet traffic needs

                                                        bullLoad on single partition has exceeded the limits

                                                        WCF Data Services

                                                        bullUse a new context for each logical operation

                                                        bullAddObjectAttachTo can throw exception if entity is already being tracked

                                                        bullPoint query throws an exception if resource does not exist Use IgnoreResourceNotFoundException

                                                        Queues Their Unique Role in Building Reliable Scalable Applications

                                                        bull Want roles that work closely together but are not bound together bull Tight coupling leads to brittleness

                                                        bull This can aid in scaling and performance

                                                        bull A queue can hold an unlimited number of messages bull Messages must be serializable as XML

                                                        bull Limited to 8KB in size

                                                        bull Commonly use the work ticket pattern

                                                        bull Why not simply use a table

                                                        Queue Terminology

                                                        Message Lifecycle

                                                        Queue

                                                        Msg 1

                                                        Msg 2

                                                        Msg 3

                                                        Msg 4

                                                        Worker Role

                                                        Worker Role

                                                        PutMessage

                                                        Web Role

                                                        GetMessage (Timeout) RemoveMessage

                                                        Msg 2 Msg 1

                                                        Worker Role

                                                        Msg 2

                                                        POST httpmyaccountqueuecorewindowsnetmyqueuemessages

                                                        HTTP11 200 OK Transfer-Encoding chunked Content-Type applicationxml Date Tue 09 Dec 2008 210430 GMT Server Nephos Queue Service Version 10 Microsoft-HTTPAPI20

                                                        ltxml version=10 encoding=utf-8gt ltQueueMessagesListgt ltQueueMessagegt ltMessageIdgt5974b586-0df3-4e2d-ad0c-18e3892bfca2ltMessageIdgt ltInsertionTimegtMon 22 Sep 2008 232920 GMTltInsertionTimegt ltExpirationTimegtMon 29 Sep 2008 232920 GMTltExpirationTimegt ltPopReceiptgtYzQ4Yzg1MDIGM0MDFiZDAwYzEwltPopReceiptgt ltTimeNextVisiblegtTue 23 Sep 2008 052920GMTltTimeNextVisiblegt ltMessageTextgtPHRlc3Q+dGdGVzdD4=ltMessageTextgt ltQueueMessagegt ltQueueMessagesListgt

                                                        DELETE httpmyaccountqueuecorewindowsnetmyqueuemessagesmessageidpopreceipt=YzQ4Yzg1MDIGM0MDFiZDAwYzEw

                                                        Truncated Exponential Back Off Polling

                                                        Consider a backoff polling approach Each empty poll

                                                        increases interval by 2x

                                                        A successful sets the interval back to 1

                                                        44

                                                        2 1

                                                        1 1

                                                        C1

                                                        C2

                                                        Removing Poison Messages

                                                        1 1

                                                        2 1

                                                        3 4 0

                                                        Producers Consumers

                                                        P2

                                                        P1

                                                        3 0

                                                        2 GetMessage(Q 30 s) msg 2

                                                        1 GetMessage(Q 30 s) msg 1

                                                        1 1

                                                        2 1

                                                        1 0

                                                        2 0

                                                        45

                                                        C1

                                                        C2

                                                        Removing Poison Messages

                                                        3 4 0

                                                        Producers Consumers

                                                        P2

                                                        P1

                                                        1 1

                                                        2 1

                                                        2 GetMessage(Q 30 s) msg 2 3 C2 consumed msg 2 4 DeleteMessage(Q msg 2) 7 GetMessage(Q 30 s) msg 1

                                                        1 GetMessage(Q 30 s) msg 1 5 C1 crashed

                                                        1 1

                                                        2 1

                                                        6 msg1 visible 30 s after Dequeue 3 0

                                                        1 2

                                                        1 1

                                                        1 2

                                                        46

                                                        C1

                                                        C2

                                                        Removing Poison Messages

                                                        3 4 0

                                                        Producers Consumers

                                                        P2

                                                        P1

                                                        1 2

                                                        2 Dequeue(Q 30 sec) msg 2 3 C2 consumed msg 2 4 Delete(Q msg 2) 7 Dequeue(Q 30 sec) msg 1 8 C2 crashed

                                                        1 Dequeue(Q 30 sec) msg 1 5 C1 crashed 10 C1 restarted 11 Dequeue(Q 30 sec) msg 1 12 DequeueCount gt 2 13 Delete (Q msg1) 1

                                                        2

                                                        6 msg1 visible 30s after Dequeue 9 msg1 visible 30s after Dequeue

                                                        3 0

                                                        1 3

                                                        1 2

                                                        1 3

                                                        Queues Recap

                                                        bullNo need to deal with failures Make message

                                                        processing idempotent

                                                        bullInvisible messages result in out of order Do not rely on order

                                                        bullEnforce threshold on messagersquos dequeue count Use Dequeue count to remove

                                                        poison messages

                                                        bullMessages gt 8KB

                                                        bullBatch messages

                                                        bullGarbage collect orphaned blobs

                                                        bullDynamically increasereduce workers

                                                        Use blob to store message data with

                                                        reference in message

                                                        Use message count to scale

                                                        bullNo need to deal with failures

                                                        bullInvisible messages result in out of order

                                                        bullEnforce threshold on messagersquos dequeue count

                                                        bullDynamically increasereduce workers

                                                        Windows Azure Storage Takeaways

                                                        Blobs

                                                        Drives

                                                        Tables

                                                        Queues

                                                        httpblogsmsdncomwindowsazurestorage

                                                        httpazurescopecloudappnet

                                                        49

                                                        A Quick Exercise

                                                        hellipThen letrsquos look at some code and some tools

                                                        50

                                                        Code ndash AccountInformationcs public class AccountInformation private static string storageKey = ldquotHiSiSnOtMyKeY private static string accountName = jjstore private static StorageCredentialsAccountAndKey credentials internal static StorageCredentialsAccountAndKey Credentials get if (credentials == null) credentials = new StorageCredentialsAccountAndKey(accountName storageKey) return credentials

                                                        51

                                                        Code ndash BlobHelpercs public class BlobHelper private static string defaultContainerName = school private CloudBlobClient client = null private CloudBlobContainer container = null private void InitContainer() if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudBlobClient() container = clientGetContainerReference(defaultContainerName) containerCreateIfNotExist() BlobContainerPermissions permissions = containerGetPermissions() permissionsPublicAccess = BlobContainerPublicAccessTypeContainer containerSetPermissions(permissions)

                                                        52

                                                        Code ndash BlobHelpercs

                                                        public void WriteFileToBlob(string filePath) if (client == null || container == null) InitContainer() FileInfo file = new FileInfo(filePath) CloudBlob blob = containerGetBlobReference(fileName) blobPropertiesContentType = GetContentType(fileExtension) blobUploadFile(fileFullName) Or if you want to write a string replace the last line with blobUploadText(someString) And make sure you set the content type to the appropriate MIME type (eg ldquotextplainrdquo)

                                                        53

                                                        Code ndash BlobHelpercs

                                                        public string GetBlobText(string blobName) if (client == null || container == null) InitContainer() CloudBlob blob = containerGetBlobReference(blobName) try return blobDownloadText() catch (Exception) The blob probably does not exist or there is no connection available return null

                                                        54

                                                        Application Code - Blobs private void SaveToCloudButton_Click(object sender RoutedEventArgs e) StringBuilder buff = new StringBuilder() buffAppendLine(LastNameFirstNameEmailBirthdayNativeLanguageFavoriteIceCreamYearsInPhDGraduated) foreach (AttendeeEntity attendee in attendees) buffAppendLine(attendeeToCsvString()) blobHelperWriteStringToBlob(SummerSchoolAttendeestxt buffToString())

                                                        The blob is now available at httpltAccountNamegtblobcorewindowsnetltContainerNamegtltBlobNamegt Or in this case httpjjstoreblobcorewindowsnetschoolSummerSchoolAttendeestxt

                                                        55

                                                        Code - TableEntities using MicrosoftWindowsAzureStorageClient public class AttendeeEntity TableServiceEntity public string FirstName get set public string LastName get set public string Email get set public DateTime Birthday get set public string FavoriteIceCream get set public int YearsInPhD get set public bool Graduated get set hellip

                                                        56

                                                        Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

                                                        57

                                                        Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

                                                        58

                                                        Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

                                                        59

                                                        Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

                                                        60

                                                        Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

                                                        Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

                                                        61

                                                        Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

                                                        62

                                                        Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

                                                        Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

                                                        63

                                                        Tools ndash Fiddler2

                                                        Best Practices

                                                        Picking the Right VM Size

                                                        bull Having the correct VM size can make a big difference in costs

                                                        bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                                                        bull If you scale better than linear across cores larger VMs could save you money

                                                        bull Pretty rare to see linear scaling across 8 cores

                                                        bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                                                        bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                                                        Using Your VM to the Maximum

                                                        Remember bull 1 role instance == 1 VM running Windows

                                                        bull 1 role instance = one specific task for your code

                                                        bull Yoursquore paying for the entire VM so why not use it

                                                        bull Common mistake ndash split up code into multiple roles each not using up CPU

                                                        bull Balance between using up CPU vs having free capacity in times of need

                                                        bull Multiple ways to use your CPU to the fullest

                                                        Exploiting Concurrency

                                                        bull Spin up additional processes each with a specific task or as a unit of concurrency

                                                        bull May not be ideal if number of active processes exceeds number of cores

                                                        bull Use multithreading aggressively

                                                        bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                                                        bull In NET 4 use the Task Parallel Library

                                                        bull Data parallelism

                                                        bull Task parallelism

                                                        Finding Good Code Neighbors

                                                        bull Typically code falls into one or more of these categories

                                                        bull Find code that is intensive with different resources to live together

                                                        bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                                                        Memory Intensive

                                                        CPU Intensive

                                                        Network IO Intensive

                                                        Storage IO Intensive

                                                        Scaling Appropriately

                                                        bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                                                        bull Spinning VMs up and down automatically is good at large scale

                                                        bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                                                        bull Being too aggressive in spinning down VMs can result in poor user experience

                                                        bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                                                        Performance Cost

                                                        Storage Costs

                                                        bull Understand an applicationrsquos storage profile and how storage billing works

                                                        bull Make service choices based on your app profile

                                                        bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                                                        bull Service choice can make a big cost difference based on your app profile

                                                        bull Caching and compressing They help a lot with storage costs

                                                        Saving Bandwidth Costs

                                                        Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                                        Sending fewer things over the wire often means getting fewer things from storage

                                                        Saving bandwidth costs often lead to savings in other places

                                                        Sending fewer things means your VM has time to do other tasks

                                                        All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                                        Compressing Content

                                                        1 Gzip all output content

                                                        bull All modern browsers can decompress on the fly

                                                        bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                                        2Tradeoff compute costs for storage size

                                                        3Minimize image sizes

                                                        bull Use Portable Network Graphics (PNGs)

                                                        bull Crush your PNGs

                                                        bull Strip needless metadata

                                                        bull Make all PNGs palette PNGs

                                                        Uncompressed

                                                        Content

                                                        Compressed

                                                        Content

                                                        Gzip

                                                        Minify JavaScript

                                                        Minify CCS

                                                        Minify Images

                                                        Best Practices Summary

                                                        Doing lsquolessrsquo is the key to saving costs

                                                        Measure everything

                                                        Know your application profile in and out

                                                        Research Examples in the Cloud

                                                        hellipon another set of slides

                                                        Map Reduce on Azure

                                                        bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                                        bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                                        76

                                                        Project Daytona - Map Reduce on Azure

                                                        httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                                        77

                                                        Questions and Discussionhellip

                                                        Thank you for hosting me at the Summer School

                                                        BLAST (Basic Local Alignment Search Tool)

                                                        bull The most important software in bioinformatics

                                                        bull Identify similarity between bio-sequences

                                                        Computationally intensive

                                                        bull Large number of pairwise alignment operations

                                                        bull A BLAST running can take 700 ~ 1000 CPU hours

                                                        bull Sequence databases growing exponentially

                                                        bull GenBank doubled in size in about 15 months

                                                        It is easy to parallelize BLAST

                                                        bull Segment the input

                                                        bull Segment processing (querying) is pleasingly parallel

                                                        bull Segment the database (eg mpiBLAST)

                                                        bull Needs special result reduction processing

                                                        Large volume data

                                                        bull A normal Blast database can be as large as 10GB

                                                        bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                        bull The output of BLAST is usually 10-100x larger than the input

                                                        bull Parallel BLAST engine on Azure

                                                        bull Query-segmentation data-parallel pattern bull split the input sequences

                                                        bull query partitions in parallel

                                                        bull merge results together when done

                                                        bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                        bull With three special considerations bull Batch job management

                                                        bull Task parallelism on an elastic Cloud

                                                        Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                        on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                        A simple SplitJoin pattern

                                                        Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                        bull 1248 for small middle large and extra large instance size

                                                        Task granularity bull Large partition load imbalance

                                                        bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                        bull Data transferring overhead

                                                        Best Practice test runs to profiling and set size to mitigate the overhead

                                                        Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                        bull too small repeated computation

                                                        bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                        bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                        bull Watch out for the 2-hour maximum limitation

                                                        BLAST task

                                                        Splitting task

                                                        BLAST task

                                                        BLAST task

                                                        BLAST task

                                                        hellip

                                                        Merging Task

                                                        Task size vs Performance

                                                        bull Benefit of the warm cache effect

                                                        bull 100 sequences per partition is the best choice

                                                        Instance size vs Performance

                                                        bull Super-linear speedup with larger size worker instances

                                                        bull Primarily due to the memory capability

                                                        Task SizeInstance Size vs Cost

                                                        bull Extra-large instance generated the best and the most economical throughput

                                                        bull Fully utilize the resource

                                                        Web

                                                        Portal

                                                        Web

                                                        Service

                                                        Job registration

                                                        Job Scheduler

                                                        Worker

                                                        Worker

                                                        Worker

                                                        Global

                                                        dispatch

                                                        queue

                                                        Web Role

                                                        Azure Table

                                                        Job Management Role

                                                        Azure Blob

                                                        Database

                                                        updating Role

                                                        hellip

                                                        Scaling Engine

                                                        Blast databases

                                                        temporary data etc)

                                                        Job Registry NCBI databases

                                                        BLAST task

                                                        Splitting task

                                                        BLAST task

                                                        BLAST task

                                                        BLAST task

                                                        hellip

                                                        Merging Task

                                                        ASPNET program hosted by a web role instance bull Submit jobs

                                                        bull Track jobrsquos status and logs

                                                        AuthenticationAuthorization based on Live ID

                                                        The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                        Web Portal

                                                        Web Service

                                                        Job registration

                                                        Job Scheduler

                                                        Job Portal

                                                        Scaling Engine

                                                        Job Registry

                                                        R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                        Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                        bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                        AzureBLAST significantly saved computing timehellip

                                                        Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                        ldquoAll against Allrdquo query bull The database is also the input query

                                                        bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                        bull Theoretically 100 billion sequence comparisons

                                                        Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                        bull Would require 3216731 minutes (61 years) on one desktop

                                                        One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                        bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                        bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                        bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                        bull Each segment consists of smaller partitions

                                                        bull When load imbalances redistribute the load manually

                                                        5

                                                        0

                                                        62 6

                                                        2 6

                                                        2 6

                                                        2 6

                                                        2 5

                                                        0 62

                                                        bull Total size of the output result is ~230GB

                                                        bull The number of total hits is 1764579487

                                                        bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                        bull But based our estimates real working instance time should be 6~8 day

                                                        bull Look into log data to analyze what took placehellip

                                                        5

                                                        0

                                                        62 6

                                                        2 6

                                                        2 6

                                                        2 6

                                                        2 5

                                                        0 62

                                                        A normal log record should be

                                                        Otherwise something is wrong (eg task failed to complete)

                                                        3312010 614 RD00155D3611B0 Executing the task 251523

                                                        3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                        3312010 625 RD00155D3611B0 Executing the task 251553

                                                        3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                        3312010 644 RD00155D3611B0 Executing the task 251600

                                                        3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                        3312010 822 RD00155D3611B0 Executing the task 251774

                                                        3312010 950 RD00155D3611B0 Executing the task 251895

                                                        3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                        North Europe Data Center totally 34256 tasks processed

                                                        All 62 compute nodes lost tasks

                                                        and then came back in a group

                                                        This is an Update domain

                                                        ~30 mins

                                                        ~ 6 nodes in one group

                                                        35 Nodes experience blob

                                                        writing failure at same

                                                        time

                                                        West Europe Datacenter 30976 tasks are completed and job was killed

                                                        A reasonable guess the

                                                        Fault Domain is working

                                                        MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                        You never miss the water till the well has run dry Irish Proverb

                                                        ET = Water volume evapotranspired (m3 s-1 m-2)

                                                        Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                        λv = Latent heat of vaporization (Jg)

                                                        Rn = Net radiation (W m-2)

                                                        cp = Specific heat capacity of air (J kg-1 K-1)

                                                        ρa = dry air density (kg m-3)

                                                        δq = vapor pressure deficit (Pa)

                                                        ga = Conductivity of air (inverse of ra) (m s-1)

                                                        gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                        γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                        Estimating resistanceconductivity across a

                                                        catchment can be tricky

                                                        bull Lots of inputs big data reduction

                                                        bull Some of the inputs are not so simple

                                                        119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                        (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                        Penman-Monteith (1964)

                                                        Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                        NASA MODIS

                                                        imagery source

                                                        archives

                                                        5 TB (600K files)

                                                        FLUXNET curated

                                                        sensor dataset

                                                        (30GB 960 files)

                                                        FLUXNET curated

                                                        field dataset

                                                        2 KB (1 file)

                                                        NCEPNCAR

                                                        ~100MB

                                                        (4K files)

                                                        Vegetative

                                                        clumping

                                                        ~5MB (1file)

                                                        Climate

                                                        classification

                                                        ~1MB (1file)

                                                        20 US year = 1 global year

                                                        Data collection (map) stage

                                                        bull Downloads requested input tiles

                                                        from NASA ftp sites

                                                        bull Includes geospatial lookup for

                                                        non-sinusoidal tiles that will

                                                        contribute to a reprojected

                                                        sinusoidal tile

                                                        Reprojection (map) stage

                                                        bull Converts source tile(s) to

                                                        intermediate result sinusoidal tiles

                                                        bull Simple nearest neighbor or spline

                                                        algorithms

                                                        Derivation reduction stage

                                                        bull First stage visible to scientist

                                                        bull Computes ET in our initial use

                                                        Analysis reduction stage

                                                        bull Optional second stage visible to

                                                        scientist

                                                        bull Enables production of science

                                                        analysis artifacts such as maps

                                                        tables virtual sensors

                                                        Reduction 1

                                                        Queue

                                                        Source

                                                        Metadata

                                                        AzureMODIS

                                                        Service Web Role Portal

                                                        Request

                                                        Queue

                                                        Scientific

                                                        Results

                                                        Download

                                                        Data Collection Stage

                                                        Source Imagery Download Sites

                                                        Reprojection

                                                        Queue

                                                        Reduction 2

                                                        Queue

                                                        Download

                                                        Queue

                                                        Scientists

                                                        Science results

                                                        Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                        httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                        bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                        bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                        bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                        recoverable units of work

                                                        bull Execution status of all jobs and tasks persisted in Tables

                                                        ltPipelineStagegt

                                                        Request

                                                        hellip ltPipelineStagegtJobStatus

                                                        Persist ltPipelineStagegtJob Queue

                                                        MODISAzure Service

                                                        (Web Role)

                                                        Service Monitor

                                                        (Worker Role)

                                                        Parse amp Persist ltPipelineStagegtTaskStatus

                                                        hellip

                                                        Dispatch

                                                        ltPipelineStagegtTask Queue

                                                        All work actually done by a Worker Role

                                                        bull Sandboxes science or other executable

                                                        bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                        Service Monitor

                                                        (Worker Role)

                                                        Parse amp Persist ltPipelineStagegtTaskStatus

                                                        GenericWorker

                                                        (Worker Role)

                                                        hellip

                                                        hellip

                                                        Dispatch

                                                        ltPipelineStagegtTask Queue

                                                        hellip

                                                        ltInputgtData Storage

                                                        bull Dequeues tasks created by the Service Monitor

                                                        bull Retries failed tasks 3 times

                                                        bull Maintains all task status

                                                        Reprojection Request

                                                        hellip

                                                        Service Monitor

                                                        (Worker Role)

                                                        ReprojectionJobStatus Persist

                                                        Parse amp Persist ReprojectionTaskStatus

                                                        GenericWorker

                                                        (Worker Role)

                                                        hellip

                                                        Job Queue

                                                        hellip

                                                        Dispatch

                                                        Task Queue

                                                        Points to

                                                        hellip

                                                        ScanTimeList

                                                        SwathGranuleMeta

                                                        Reprojection Data

                                                        Storage

                                                        Each entity specifies a

                                                        single reprojection job

                                                        request

                                                        Each entity specifies a

                                                        single reprojection task (ie

                                                        a single tile)

                                                        Query this table to get

                                                        geo-metadata (eg

                                                        boundaries) for each swath

                                                        tile

                                                        Query this table to get the

                                                        list of satellite scan times

                                                        that cover a target tile

                                                        Swath Source

                                                        Data Storage

                                                        bull Computational costs driven by data scale and need to run reduction multiple times

                                                        bull Storage costs driven by data scale and 6 month project duration

                                                        bull Small with respect to the people costs even at graduate student rates

                                                        Reduction 1 Queue

                                                        Source Metadata

                                                        Request Queue

                                                        Scientific Results Download

                                                        Data Collection Stage

                                                        Source Imagery Download Sites

                                                        Reprojection Queue

                                                        Reduction 2 Queue

                                                        Download

                                                        Queue

                                                        Scientists

                                                        Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                        400-500 GB

                                                        60K files

                                                        10 MBsec

                                                        11 hours

                                                        lt10 workers

                                                        $50 upload

                                                        $450 storage

                                                        400 GB

                                                        45K files

                                                        3500 hours

                                                        20-100

                                                        workers

                                                        5-7 GB

                                                        55K files

                                                        1800 hours

                                                        20-100

                                                        workers

                                                        lt10 GB

                                                        ~1K files

                                                        1800 hours

                                                        20-100

                                                        workers

                                                        $420 cpu

                                                        $60 download

                                                        $216 cpu

                                                        $1 download

                                                        $6 storage

                                                        $216 cpu

                                                        $2 download

                                                        $9 storage

                                                        AzureMODIS

                                                        Service Web Role Portal

                                                        Total $1420

                                                        bull Clouds are the largest scale computer centers ever constructed and have the

                                                        potential to be important to both large and small scale science problems

                                                        bull Equally import they can increase participation in research providing needed

                                                        resources to userscommunities without ready access

                                                        bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                        support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                        latency applications do not perform optimally on clouds today

                                                        bull Provide valuable fault tolerance and scalability abstractions

                                                        bull Clouds as amplifier for familiar client tools and on premise compute

                                                        bull Clouds services to support research provide considerable leverage for both

                                                        individual researchers and entire communities of researchers

                                                        • Day 2 - Azure as a PaaS
                                                        • Day 2 - Applications

                                                          Two Types of Blobs Under the Hood

                                                          bull Block Blob

                                                          bull Targeted at streaming workloads

                                                          bull Each blob consists of a sequence of blocks bull Each block is identified by a Block ID

                                                          bull Size limit 200GB per blob

                                                          bull Page Blob

                                                          bull Targeted at random readwrite workloads

                                                          bull Each blob consists of an array of pages bull Each page is identified by its offset

                                                          from the start of the blob

                                                          bull Size limit 1TB per blob

                                                          Windows Azure Drive

                                                          bull Provides a durable NTFS volume for Windows Azure applications to use

                                                          bull Use existing NTFS APIs to access a durable drive bull Durability and survival of data on application failover

                                                          bull Enables migrating existing NTFS applications to the cloud

                                                          bull A Windows Azure Drive is a Page Blob

                                                          bull Example mount Page Blob as X bull httpltaccountnamegtblobcorewindowsnetltcontainernamegtltblobnamegt

                                                          bull All writes to drive are made durable to the Page Blob bull Drive made durable through standard Page Blob replication

                                                          bull Drive persists even when not mounted as a Page Blob

                                                          Windows Azure Tables

                                                          bull Provides Structured Storage

                                                          bull Massively Scalable Tables bull Billions of entities (rows) and TBs of data

                                                          bull Can use thousands of servers as traffic grows

                                                          bull Highly Available amp Durable bull Data is replicated several times

                                                          bull Familiar and Easy to use API

                                                          bull WCF Data Services and OData bull NET classes and LINQ

                                                          bull REST ndash with any platform or language

                                                          Windows Azure Queues

                                                          bull Queue are performance efficient highly available and provide reliable message delivery

                                                          bull Simple asynchronous work dispatch

                                                          bull Programming semantics ensure that a message can be processed at least once

                                                          bull Access is provided via REST

                                                          Storage Partitioning

                                                          Understanding partitioning is key to understanding performance

                                                          bull Different for each data type (blobs entities queues) Every data object has a partition key

                                                          bull A partition can be served by a single server

                                                          bull System load balances partitions based on traffic pattern

                                                          bull Controls entity locality Partition key is unit of scale

                                                          bull Load balancing can take a few minutes to kick in

                                                          bull Can take a couple of seconds for partition to be available on a different server

                                                          System load balances

                                                          bull Use exponential backoff on ldquoServer Busyrdquo

                                                          bull Our system load balances to meet your traffic needs

                                                          bull Single partition limits have been reached Server Busy

                                                          Partition Keys In Each Abstraction

                                                          bull Entities w same PartitionKey value served from same partition Entities ndash TableName + PartitionKey

                                                          PartitionKey (CustomerId) RowKey (RowKind)

                                                          Name CreditCardNumber OrderTotal

                                                          1 Customer-John Smith John Smith xxxx-xxxx-xxxx-xxxx

                                                          1 Order ndash 1 $3512

                                                          2 Customer-Bill Johnson Bill Johnson xxxx-xxxx-xxxx-xxxx

                                                          2 Order ndash 3 $1000

                                                          bull Every blob and its snapshots are in a single partition Blobs ndash Container name + Blob name

                                                          bullAll messages for a single queue belong to the same partition Messages ndash Queue Name

                                                          Container Name Blob Name

                                                          image annarborbighousejpg

                                                          image foxboroughgillettejpg

                                                          video annarborbighousejpg

                                                          Queue Message

                                                          jobs Message1

                                                          jobs Message2

                                                          workflow Message1

                                                          Scalability Targets

                                                          Storage Account

                                                          bull Capacity ndash Up to 100 TBs

                                                          bull Transactions ndash Up to a few thousand requests per second

                                                          bull Bandwidth ndash Up to a few hundred megabytes per second

                                                          Single QueueTable Partition

                                                          bull Up to 500 transactions per second

                                                          To go above these numbers partition between multiple storage accounts and partitions

                                                          When limit is hit app will see lsquo503 server busyrsquo applications should implement exponential backoff

                                                          Single Blob Partition

                                                          bull Throughput up to 60 MBs

                                                          PartitionKey (Category)

                                                          RowKey (Title)

                                                          Timestamp ReleaseDate

                                                          Action Fast amp Furious hellip 2009

                                                          Action The Bourne Ultimatum hellip 2007

                                                          hellip hellip hellip hellip

                                                          Animation Open Season 2 hellip 2009

                                                          Animation The Ant Bully hellip 2006

                                                          PartitionKey (Category)

                                                          RowKey (Title)

                                                          Timestamp ReleaseDate

                                                          Comedy Office Space hellip 1999

                                                          hellip hellip hellip hellip

                                                          SciFi X-Men Origins Wolverine hellip 2009

                                                          hellip hellip hellip hellip

                                                          War Defiance hellip 2008

                                                          PartitionKey (Category)

                                                          RowKey (Title)

                                                          Timestamp ReleaseDate

                                                          Action Fast amp Furious hellip 2009

                                                          Action The Bourne Ultimatum hellip 2007

                                                          hellip hellip hellip hellip

                                                          Animation Open Season 2 hellip 2009

                                                          Animation The Ant Bully hellip 2006

                                                          hellip hellip hellip hellip

                                                          Comedy Office Space hellip 1999

                                                          hellip hellip hellip hellip

                                                          SciFi X-Men Origins Wolverine hellip 2009

                                                          hellip hellip hellip hellip

                                                          War Defiance hellip 2008

                                                          Partitions and Partition Ranges

                                                          Key Selection Things to Consider

                                                          bullDistribute load as much as possible bullHot partitions can be load balanced bullPartitionKey is critical for scalability

                                                          See httpwwwmicrosoftpdccom2009SVC09 and httpazurescopecloudappnet for more information

                                                          bull Avoid frequent large scans bull Parallelize queries bull Point queries are most efficient

                                                          bullTransactions across a single partition bullTransaction semantics amp Reduce round trips

                                                          Scalability

                                                          Query Efficiency amp Speed

                                                          Entity group transactions

                                                          Expect Continuation Tokens ndash Seriously

                                                          Maximum of 1000 rows in a response

                                                          At the end of partition range boundary

                                                          Maximum of 1000 rows in a response

                                                          At the end of partition range boundary

                                                          Maximum of 5 seconds to execute the query

                                                          Tables Recap bullEfficient for frequently used queries

                                                          bullSupports batch transactions

                                                          bullDistributes load

                                                          Select PartitionKey and RowKey that help scale

                                                          Avoid ldquoAppend onlyrdquo patterns

                                                          Always Handle continuation tokens

                                                          ldquoORrdquo predicates are not optimized

                                                          Implement back-off strategy for retries

                                                          bullDistribute by using a hash etc as prefix

                                                          bullExpect continuation tokens for range queries

                                                          bullExecute the queries that form the ldquoORrdquo predicates as separate queries

                                                          bullServer busy

                                                          bullLoad balance partitions to meet traffic needs

                                                          bullLoad on single partition has exceeded the limits

                                                          WCF Data Services

                                                          bullUse a new context for each logical operation

                                                          bullAddObjectAttachTo can throw exception if entity is already being tracked

                                                          bullPoint query throws an exception if resource does not exist Use IgnoreResourceNotFoundException

                                                          Queues Their Unique Role in Building Reliable Scalable Applications

                                                          bull Want roles that work closely together but are not bound together bull Tight coupling leads to brittleness

                                                          bull This can aid in scaling and performance

                                                          bull A queue can hold an unlimited number of messages bull Messages must be serializable as XML

                                                          bull Limited to 8KB in size

                                                          bull Commonly use the work ticket pattern

                                                          bull Why not simply use a table

                                                          Queue Terminology

                                                          Message Lifecycle

                                                          Queue

                                                          Msg 1

                                                          Msg 2

                                                          Msg 3

                                                          Msg 4

                                                          Worker Role

                                                          Worker Role

                                                          PutMessage

                                                          Web Role

                                                          GetMessage (Timeout) RemoveMessage

                                                          Msg 2 Msg 1

                                                          Worker Role

                                                          Msg 2

                                                          POST httpmyaccountqueuecorewindowsnetmyqueuemessages

                                                          HTTP11 200 OK Transfer-Encoding chunked Content-Type applicationxml Date Tue 09 Dec 2008 210430 GMT Server Nephos Queue Service Version 10 Microsoft-HTTPAPI20

                                                          ltxml version=10 encoding=utf-8gt ltQueueMessagesListgt ltQueueMessagegt ltMessageIdgt5974b586-0df3-4e2d-ad0c-18e3892bfca2ltMessageIdgt ltInsertionTimegtMon 22 Sep 2008 232920 GMTltInsertionTimegt ltExpirationTimegtMon 29 Sep 2008 232920 GMTltExpirationTimegt ltPopReceiptgtYzQ4Yzg1MDIGM0MDFiZDAwYzEwltPopReceiptgt ltTimeNextVisiblegtTue 23 Sep 2008 052920GMTltTimeNextVisiblegt ltMessageTextgtPHRlc3Q+dGdGVzdD4=ltMessageTextgt ltQueueMessagegt ltQueueMessagesListgt

                                                          DELETE httpmyaccountqueuecorewindowsnetmyqueuemessagesmessageidpopreceipt=YzQ4Yzg1MDIGM0MDFiZDAwYzEw

                                                          Truncated Exponential Back Off Polling

                                                          Consider a backoff polling approach Each empty poll

                                                          increases interval by 2x

                                                          A successful sets the interval back to 1

                                                          44

                                                          2 1

                                                          1 1

                                                          C1

                                                          C2

                                                          Removing Poison Messages

                                                          1 1

                                                          2 1

                                                          3 4 0

                                                          Producers Consumers

                                                          P2

                                                          P1

                                                          3 0

                                                          2 GetMessage(Q 30 s) msg 2

                                                          1 GetMessage(Q 30 s) msg 1

                                                          1 1

                                                          2 1

                                                          1 0

                                                          2 0

                                                          45

                                                          C1

                                                          C2

                                                          Removing Poison Messages

                                                          3 4 0

                                                          Producers Consumers

                                                          P2

                                                          P1

                                                          1 1

                                                          2 1

                                                          2 GetMessage(Q 30 s) msg 2 3 C2 consumed msg 2 4 DeleteMessage(Q msg 2) 7 GetMessage(Q 30 s) msg 1

                                                          1 GetMessage(Q 30 s) msg 1 5 C1 crashed

                                                          1 1

                                                          2 1

                                                          6 msg1 visible 30 s after Dequeue 3 0

                                                          1 2

                                                          1 1

                                                          1 2

                                                          46

                                                          C1

                                                          C2

                                                          Removing Poison Messages

                                                          3 4 0

                                                          Producers Consumers

                                                          P2

                                                          P1

                                                          1 2

                                                          2 Dequeue(Q 30 sec) msg 2 3 C2 consumed msg 2 4 Delete(Q msg 2) 7 Dequeue(Q 30 sec) msg 1 8 C2 crashed

                                                          1 Dequeue(Q 30 sec) msg 1 5 C1 crashed 10 C1 restarted 11 Dequeue(Q 30 sec) msg 1 12 DequeueCount gt 2 13 Delete (Q msg1) 1

                                                          2

                                                          6 msg1 visible 30s after Dequeue 9 msg1 visible 30s after Dequeue

                                                          3 0

                                                          1 3

                                                          1 2

                                                          1 3

                                                          Queues Recap

                                                          bullNo need to deal with failures Make message

                                                          processing idempotent

                                                          bullInvisible messages result in out of order Do not rely on order

                                                          bullEnforce threshold on messagersquos dequeue count Use Dequeue count to remove

                                                          poison messages

                                                          bullMessages gt 8KB

                                                          bullBatch messages

                                                          bullGarbage collect orphaned blobs

                                                          bullDynamically increasereduce workers

                                                          Use blob to store message data with

                                                          reference in message

                                                          Use message count to scale

                                                          bullNo need to deal with failures

                                                          bullInvisible messages result in out of order

                                                          bullEnforce threshold on messagersquos dequeue count

                                                          bullDynamically increasereduce workers

                                                          Windows Azure Storage Takeaways

                                                          Blobs

                                                          Drives

                                                          Tables

                                                          Queues

                                                          httpblogsmsdncomwindowsazurestorage

                                                          httpazurescopecloudappnet

                                                          49

                                                          A Quick Exercise

                                                          hellipThen letrsquos look at some code and some tools

                                                          50

                                                          Code ndash AccountInformationcs public class AccountInformation private static string storageKey = ldquotHiSiSnOtMyKeY private static string accountName = jjstore private static StorageCredentialsAccountAndKey credentials internal static StorageCredentialsAccountAndKey Credentials get if (credentials == null) credentials = new StorageCredentialsAccountAndKey(accountName storageKey) return credentials

                                                          51

                                                          Code ndash BlobHelpercs public class BlobHelper private static string defaultContainerName = school private CloudBlobClient client = null private CloudBlobContainer container = null private void InitContainer() if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudBlobClient() container = clientGetContainerReference(defaultContainerName) containerCreateIfNotExist() BlobContainerPermissions permissions = containerGetPermissions() permissionsPublicAccess = BlobContainerPublicAccessTypeContainer containerSetPermissions(permissions)

                                                          52

                                                          Code ndash BlobHelpercs

                                                          public void WriteFileToBlob(string filePath) if (client == null || container == null) InitContainer() FileInfo file = new FileInfo(filePath) CloudBlob blob = containerGetBlobReference(fileName) blobPropertiesContentType = GetContentType(fileExtension) blobUploadFile(fileFullName) Or if you want to write a string replace the last line with blobUploadText(someString) And make sure you set the content type to the appropriate MIME type (eg ldquotextplainrdquo)

                                                          53

                                                          Code ndash BlobHelpercs

                                                          public string GetBlobText(string blobName) if (client == null || container == null) InitContainer() CloudBlob blob = containerGetBlobReference(blobName) try return blobDownloadText() catch (Exception) The blob probably does not exist or there is no connection available return null

                                                          54

                                                          Application Code - Blobs private void SaveToCloudButton_Click(object sender RoutedEventArgs e) StringBuilder buff = new StringBuilder() buffAppendLine(LastNameFirstNameEmailBirthdayNativeLanguageFavoriteIceCreamYearsInPhDGraduated) foreach (AttendeeEntity attendee in attendees) buffAppendLine(attendeeToCsvString()) blobHelperWriteStringToBlob(SummerSchoolAttendeestxt buffToString())

                                                          The blob is now available at httpltAccountNamegtblobcorewindowsnetltContainerNamegtltBlobNamegt Or in this case httpjjstoreblobcorewindowsnetschoolSummerSchoolAttendeestxt

                                                          55

                                                          Code - TableEntities using MicrosoftWindowsAzureStorageClient public class AttendeeEntity TableServiceEntity public string FirstName get set public string LastName get set public string Email get set public DateTime Birthday get set public string FavoriteIceCream get set public int YearsInPhD get set public bool Graduated get set hellip

                                                          56

                                                          Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

                                                          57

                                                          Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

                                                          58

                                                          Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

                                                          59

                                                          Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

                                                          60

                                                          Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

                                                          Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

                                                          61

                                                          Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

                                                          62

                                                          Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

                                                          Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

                                                          63

                                                          Tools ndash Fiddler2

                                                          Best Practices

                                                          Picking the Right VM Size

                                                          bull Having the correct VM size can make a big difference in costs

                                                          bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                                                          bull If you scale better than linear across cores larger VMs could save you money

                                                          bull Pretty rare to see linear scaling across 8 cores

                                                          bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                                                          bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                                                          Using Your VM to the Maximum

                                                          Remember bull 1 role instance == 1 VM running Windows

                                                          bull 1 role instance = one specific task for your code

                                                          bull Yoursquore paying for the entire VM so why not use it

                                                          bull Common mistake ndash split up code into multiple roles each not using up CPU

                                                          bull Balance between using up CPU vs having free capacity in times of need

                                                          bull Multiple ways to use your CPU to the fullest

                                                          Exploiting Concurrency

                                                          bull Spin up additional processes each with a specific task or as a unit of concurrency

                                                          bull May not be ideal if number of active processes exceeds number of cores

                                                          bull Use multithreading aggressively

                                                          bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                                                          bull In NET 4 use the Task Parallel Library

                                                          bull Data parallelism

                                                          bull Task parallelism

                                                          Finding Good Code Neighbors

                                                          bull Typically code falls into one or more of these categories

                                                          bull Find code that is intensive with different resources to live together

                                                          bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                                                          Memory Intensive

                                                          CPU Intensive

                                                          Network IO Intensive

                                                          Storage IO Intensive

                                                          Scaling Appropriately

                                                          bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                                                          bull Spinning VMs up and down automatically is good at large scale

                                                          bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                                                          bull Being too aggressive in spinning down VMs can result in poor user experience

                                                          bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                                                          Performance Cost

                                                          Storage Costs

                                                          bull Understand an applicationrsquos storage profile and how storage billing works

                                                          bull Make service choices based on your app profile

                                                          bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                                                          bull Service choice can make a big cost difference based on your app profile

                                                          bull Caching and compressing They help a lot with storage costs

                                                          Saving Bandwidth Costs

                                                          Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                                          Sending fewer things over the wire often means getting fewer things from storage

                                                          Saving bandwidth costs often lead to savings in other places

                                                          Sending fewer things means your VM has time to do other tasks

                                                          All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                                          Compressing Content

                                                          1 Gzip all output content

                                                          bull All modern browsers can decompress on the fly

                                                          bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                                          2Tradeoff compute costs for storage size

                                                          3Minimize image sizes

                                                          bull Use Portable Network Graphics (PNGs)

                                                          bull Crush your PNGs

                                                          bull Strip needless metadata

                                                          bull Make all PNGs palette PNGs

                                                          Uncompressed

                                                          Content

                                                          Compressed

                                                          Content

                                                          Gzip

                                                          Minify JavaScript

                                                          Minify CCS

                                                          Minify Images

                                                          Best Practices Summary

                                                          Doing lsquolessrsquo is the key to saving costs

                                                          Measure everything

                                                          Know your application profile in and out

                                                          Research Examples in the Cloud

                                                          hellipon another set of slides

                                                          Map Reduce on Azure

                                                          bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                                          bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                                          76

                                                          Project Daytona - Map Reduce on Azure

                                                          httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                                          77

                                                          Questions and Discussionhellip

                                                          Thank you for hosting me at the Summer School

                                                          BLAST (Basic Local Alignment Search Tool)

                                                          bull The most important software in bioinformatics

                                                          bull Identify similarity between bio-sequences

                                                          Computationally intensive

                                                          bull Large number of pairwise alignment operations

                                                          bull A BLAST running can take 700 ~ 1000 CPU hours

                                                          bull Sequence databases growing exponentially

                                                          bull GenBank doubled in size in about 15 months

                                                          It is easy to parallelize BLAST

                                                          bull Segment the input

                                                          bull Segment processing (querying) is pleasingly parallel

                                                          bull Segment the database (eg mpiBLAST)

                                                          bull Needs special result reduction processing

                                                          Large volume data

                                                          bull A normal Blast database can be as large as 10GB

                                                          bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                          bull The output of BLAST is usually 10-100x larger than the input

                                                          bull Parallel BLAST engine on Azure

                                                          bull Query-segmentation data-parallel pattern bull split the input sequences

                                                          bull query partitions in parallel

                                                          bull merge results together when done

                                                          bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                          bull With three special considerations bull Batch job management

                                                          bull Task parallelism on an elastic Cloud

                                                          Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                          on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                          A simple SplitJoin pattern

                                                          Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                          bull 1248 for small middle large and extra large instance size

                                                          Task granularity bull Large partition load imbalance

                                                          bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                          bull Data transferring overhead

                                                          Best Practice test runs to profiling and set size to mitigate the overhead

                                                          Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                          bull too small repeated computation

                                                          bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                          bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                          bull Watch out for the 2-hour maximum limitation

                                                          BLAST task

                                                          Splitting task

                                                          BLAST task

                                                          BLAST task

                                                          BLAST task

                                                          hellip

                                                          Merging Task

                                                          Task size vs Performance

                                                          bull Benefit of the warm cache effect

                                                          bull 100 sequences per partition is the best choice

                                                          Instance size vs Performance

                                                          bull Super-linear speedup with larger size worker instances

                                                          bull Primarily due to the memory capability

                                                          Task SizeInstance Size vs Cost

                                                          bull Extra-large instance generated the best and the most economical throughput

                                                          bull Fully utilize the resource

                                                          Web

                                                          Portal

                                                          Web

                                                          Service

                                                          Job registration

                                                          Job Scheduler

                                                          Worker

                                                          Worker

                                                          Worker

                                                          Global

                                                          dispatch

                                                          queue

                                                          Web Role

                                                          Azure Table

                                                          Job Management Role

                                                          Azure Blob

                                                          Database

                                                          updating Role

                                                          hellip

                                                          Scaling Engine

                                                          Blast databases

                                                          temporary data etc)

                                                          Job Registry NCBI databases

                                                          BLAST task

                                                          Splitting task

                                                          BLAST task

                                                          BLAST task

                                                          BLAST task

                                                          hellip

                                                          Merging Task

                                                          ASPNET program hosted by a web role instance bull Submit jobs

                                                          bull Track jobrsquos status and logs

                                                          AuthenticationAuthorization based on Live ID

                                                          The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                          Web Portal

                                                          Web Service

                                                          Job registration

                                                          Job Scheduler

                                                          Job Portal

                                                          Scaling Engine

                                                          Job Registry

                                                          R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                          Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                          bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                          AzureBLAST significantly saved computing timehellip

                                                          Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                          ldquoAll against Allrdquo query bull The database is also the input query

                                                          bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                          bull Theoretically 100 billion sequence comparisons

                                                          Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                          bull Would require 3216731 minutes (61 years) on one desktop

                                                          One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                          bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                          bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                          bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                          bull Each segment consists of smaller partitions

                                                          bull When load imbalances redistribute the load manually

                                                          5

                                                          0

                                                          62 6

                                                          2 6

                                                          2 6

                                                          2 6

                                                          2 5

                                                          0 62

                                                          bull Total size of the output result is ~230GB

                                                          bull The number of total hits is 1764579487

                                                          bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                          bull But based our estimates real working instance time should be 6~8 day

                                                          bull Look into log data to analyze what took placehellip

                                                          5

                                                          0

                                                          62 6

                                                          2 6

                                                          2 6

                                                          2 6

                                                          2 5

                                                          0 62

                                                          A normal log record should be

                                                          Otherwise something is wrong (eg task failed to complete)

                                                          3312010 614 RD00155D3611B0 Executing the task 251523

                                                          3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                          3312010 625 RD00155D3611B0 Executing the task 251553

                                                          3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                          3312010 644 RD00155D3611B0 Executing the task 251600

                                                          3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                          3312010 822 RD00155D3611B0 Executing the task 251774

                                                          3312010 950 RD00155D3611B0 Executing the task 251895

                                                          3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                          North Europe Data Center totally 34256 tasks processed

                                                          All 62 compute nodes lost tasks

                                                          and then came back in a group

                                                          This is an Update domain

                                                          ~30 mins

                                                          ~ 6 nodes in one group

                                                          35 Nodes experience blob

                                                          writing failure at same

                                                          time

                                                          West Europe Datacenter 30976 tasks are completed and job was killed

                                                          A reasonable guess the

                                                          Fault Domain is working

                                                          MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                          You never miss the water till the well has run dry Irish Proverb

                                                          ET = Water volume evapotranspired (m3 s-1 m-2)

                                                          Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                          λv = Latent heat of vaporization (Jg)

                                                          Rn = Net radiation (W m-2)

                                                          cp = Specific heat capacity of air (J kg-1 K-1)

                                                          ρa = dry air density (kg m-3)

                                                          δq = vapor pressure deficit (Pa)

                                                          ga = Conductivity of air (inverse of ra) (m s-1)

                                                          gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                          γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                          Estimating resistanceconductivity across a

                                                          catchment can be tricky

                                                          bull Lots of inputs big data reduction

                                                          bull Some of the inputs are not so simple

                                                          119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                          (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                          Penman-Monteith (1964)

                                                          Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                          NASA MODIS

                                                          imagery source

                                                          archives

                                                          5 TB (600K files)

                                                          FLUXNET curated

                                                          sensor dataset

                                                          (30GB 960 files)

                                                          FLUXNET curated

                                                          field dataset

                                                          2 KB (1 file)

                                                          NCEPNCAR

                                                          ~100MB

                                                          (4K files)

                                                          Vegetative

                                                          clumping

                                                          ~5MB (1file)

                                                          Climate

                                                          classification

                                                          ~1MB (1file)

                                                          20 US year = 1 global year

                                                          Data collection (map) stage

                                                          bull Downloads requested input tiles

                                                          from NASA ftp sites

                                                          bull Includes geospatial lookup for

                                                          non-sinusoidal tiles that will

                                                          contribute to a reprojected

                                                          sinusoidal tile

                                                          Reprojection (map) stage

                                                          bull Converts source tile(s) to

                                                          intermediate result sinusoidal tiles

                                                          bull Simple nearest neighbor or spline

                                                          algorithms

                                                          Derivation reduction stage

                                                          bull First stage visible to scientist

                                                          bull Computes ET in our initial use

                                                          Analysis reduction stage

                                                          bull Optional second stage visible to

                                                          scientist

                                                          bull Enables production of science

                                                          analysis artifacts such as maps

                                                          tables virtual sensors

                                                          Reduction 1

                                                          Queue

                                                          Source

                                                          Metadata

                                                          AzureMODIS

                                                          Service Web Role Portal

                                                          Request

                                                          Queue

                                                          Scientific

                                                          Results

                                                          Download

                                                          Data Collection Stage

                                                          Source Imagery Download Sites

                                                          Reprojection

                                                          Queue

                                                          Reduction 2

                                                          Queue

                                                          Download

                                                          Queue

                                                          Scientists

                                                          Science results

                                                          Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                          httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                          bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                          bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                          bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                          recoverable units of work

                                                          bull Execution status of all jobs and tasks persisted in Tables

                                                          ltPipelineStagegt

                                                          Request

                                                          hellip ltPipelineStagegtJobStatus

                                                          Persist ltPipelineStagegtJob Queue

                                                          MODISAzure Service

                                                          (Web Role)

                                                          Service Monitor

                                                          (Worker Role)

                                                          Parse amp Persist ltPipelineStagegtTaskStatus

                                                          hellip

                                                          Dispatch

                                                          ltPipelineStagegtTask Queue

                                                          All work actually done by a Worker Role

                                                          bull Sandboxes science or other executable

                                                          bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                          Service Monitor

                                                          (Worker Role)

                                                          Parse amp Persist ltPipelineStagegtTaskStatus

                                                          GenericWorker

                                                          (Worker Role)

                                                          hellip

                                                          hellip

                                                          Dispatch

                                                          ltPipelineStagegtTask Queue

                                                          hellip

                                                          ltInputgtData Storage

                                                          bull Dequeues tasks created by the Service Monitor

                                                          bull Retries failed tasks 3 times

                                                          bull Maintains all task status

                                                          Reprojection Request

                                                          hellip

                                                          Service Monitor

                                                          (Worker Role)

                                                          ReprojectionJobStatus Persist

                                                          Parse amp Persist ReprojectionTaskStatus

                                                          GenericWorker

                                                          (Worker Role)

                                                          hellip

                                                          Job Queue

                                                          hellip

                                                          Dispatch

                                                          Task Queue

                                                          Points to

                                                          hellip

                                                          ScanTimeList

                                                          SwathGranuleMeta

                                                          Reprojection Data

                                                          Storage

                                                          Each entity specifies a

                                                          single reprojection job

                                                          request

                                                          Each entity specifies a

                                                          single reprojection task (ie

                                                          a single tile)

                                                          Query this table to get

                                                          geo-metadata (eg

                                                          boundaries) for each swath

                                                          tile

                                                          Query this table to get the

                                                          list of satellite scan times

                                                          that cover a target tile

                                                          Swath Source

                                                          Data Storage

                                                          bull Computational costs driven by data scale and need to run reduction multiple times

                                                          bull Storage costs driven by data scale and 6 month project duration

                                                          bull Small with respect to the people costs even at graduate student rates

                                                          Reduction 1 Queue

                                                          Source Metadata

                                                          Request Queue

                                                          Scientific Results Download

                                                          Data Collection Stage

                                                          Source Imagery Download Sites

                                                          Reprojection Queue

                                                          Reduction 2 Queue

                                                          Download

                                                          Queue

                                                          Scientists

                                                          Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                          400-500 GB

                                                          60K files

                                                          10 MBsec

                                                          11 hours

                                                          lt10 workers

                                                          $50 upload

                                                          $450 storage

                                                          400 GB

                                                          45K files

                                                          3500 hours

                                                          20-100

                                                          workers

                                                          5-7 GB

                                                          55K files

                                                          1800 hours

                                                          20-100

                                                          workers

                                                          lt10 GB

                                                          ~1K files

                                                          1800 hours

                                                          20-100

                                                          workers

                                                          $420 cpu

                                                          $60 download

                                                          $216 cpu

                                                          $1 download

                                                          $6 storage

                                                          $216 cpu

                                                          $2 download

                                                          $9 storage

                                                          AzureMODIS

                                                          Service Web Role Portal

                                                          Total $1420

                                                          bull Clouds are the largest scale computer centers ever constructed and have the

                                                          potential to be important to both large and small scale science problems

                                                          bull Equally import they can increase participation in research providing needed

                                                          resources to userscommunities without ready access

                                                          bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                          support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                          latency applications do not perform optimally on clouds today

                                                          bull Provide valuable fault tolerance and scalability abstractions

                                                          bull Clouds as amplifier for familiar client tools and on premise compute

                                                          bull Clouds services to support research provide considerable leverage for both

                                                          individual researchers and entire communities of researchers

                                                          • Day 2 - Azure as a PaaS
                                                          • Day 2 - Applications

                                                            Windows Azure Drive

                                                            bull Provides a durable NTFS volume for Windows Azure applications to use

                                                            bull Use existing NTFS APIs to access a durable drive bull Durability and survival of data on application failover

                                                            bull Enables migrating existing NTFS applications to the cloud

                                                            bull A Windows Azure Drive is a Page Blob

                                                            bull Example mount Page Blob as X bull httpltaccountnamegtblobcorewindowsnetltcontainernamegtltblobnamegt

                                                            bull All writes to drive are made durable to the Page Blob bull Drive made durable through standard Page Blob replication

                                                            bull Drive persists even when not mounted as a Page Blob

                                                            Windows Azure Tables

                                                            bull Provides Structured Storage

                                                            bull Massively Scalable Tables bull Billions of entities (rows) and TBs of data

                                                            bull Can use thousands of servers as traffic grows

                                                            bull Highly Available amp Durable bull Data is replicated several times

                                                            bull Familiar and Easy to use API

                                                            bull WCF Data Services and OData bull NET classes and LINQ

                                                            bull REST ndash with any platform or language

                                                            Windows Azure Queues

                                                            bull Queue are performance efficient highly available and provide reliable message delivery

                                                            bull Simple asynchronous work dispatch

                                                            bull Programming semantics ensure that a message can be processed at least once

                                                            bull Access is provided via REST

                                                            Storage Partitioning

                                                            Understanding partitioning is key to understanding performance

                                                            bull Different for each data type (blobs entities queues) Every data object has a partition key

                                                            bull A partition can be served by a single server

                                                            bull System load balances partitions based on traffic pattern

                                                            bull Controls entity locality Partition key is unit of scale

                                                            bull Load balancing can take a few minutes to kick in

                                                            bull Can take a couple of seconds for partition to be available on a different server

                                                            System load balances

                                                            bull Use exponential backoff on ldquoServer Busyrdquo

                                                            bull Our system load balances to meet your traffic needs

                                                            bull Single partition limits have been reached Server Busy

                                                            Partition Keys In Each Abstraction

                                                            bull Entities w same PartitionKey value served from same partition Entities ndash TableName + PartitionKey

                                                            PartitionKey (CustomerId) RowKey (RowKind)

                                                            Name CreditCardNumber OrderTotal

                                                            1 Customer-John Smith John Smith xxxx-xxxx-xxxx-xxxx

                                                            1 Order ndash 1 $3512

                                                            2 Customer-Bill Johnson Bill Johnson xxxx-xxxx-xxxx-xxxx

                                                            2 Order ndash 3 $1000

                                                            bull Every blob and its snapshots are in a single partition Blobs ndash Container name + Blob name

                                                            bullAll messages for a single queue belong to the same partition Messages ndash Queue Name

                                                            Container Name Blob Name

                                                            image annarborbighousejpg

                                                            image foxboroughgillettejpg

                                                            video annarborbighousejpg

                                                            Queue Message

                                                            jobs Message1

                                                            jobs Message2

                                                            workflow Message1

                                                            Scalability Targets

                                                            Storage Account

                                                            bull Capacity ndash Up to 100 TBs

                                                            bull Transactions ndash Up to a few thousand requests per second

                                                            bull Bandwidth ndash Up to a few hundred megabytes per second

                                                            Single QueueTable Partition

                                                            bull Up to 500 transactions per second

                                                            To go above these numbers partition between multiple storage accounts and partitions

                                                            When limit is hit app will see lsquo503 server busyrsquo applications should implement exponential backoff

                                                            Single Blob Partition

                                                            bull Throughput up to 60 MBs

                                                            PartitionKey (Category)

                                                            RowKey (Title)

                                                            Timestamp ReleaseDate

                                                            Action Fast amp Furious hellip 2009

                                                            Action The Bourne Ultimatum hellip 2007

                                                            hellip hellip hellip hellip

                                                            Animation Open Season 2 hellip 2009

                                                            Animation The Ant Bully hellip 2006

                                                            PartitionKey (Category)

                                                            RowKey (Title)

                                                            Timestamp ReleaseDate

                                                            Comedy Office Space hellip 1999

                                                            hellip hellip hellip hellip

                                                            SciFi X-Men Origins Wolverine hellip 2009

                                                            hellip hellip hellip hellip

                                                            War Defiance hellip 2008

                                                            PartitionKey (Category)

                                                            RowKey (Title)

                                                            Timestamp ReleaseDate

                                                            Action Fast amp Furious hellip 2009

                                                            Action The Bourne Ultimatum hellip 2007

                                                            hellip hellip hellip hellip

                                                            Animation Open Season 2 hellip 2009

                                                            Animation The Ant Bully hellip 2006

                                                            hellip hellip hellip hellip

                                                            Comedy Office Space hellip 1999

                                                            hellip hellip hellip hellip

                                                            SciFi X-Men Origins Wolverine hellip 2009

                                                            hellip hellip hellip hellip

                                                            War Defiance hellip 2008

                                                            Partitions and Partition Ranges

                                                            Key Selection Things to Consider

                                                            bullDistribute load as much as possible bullHot partitions can be load balanced bullPartitionKey is critical for scalability

                                                            See httpwwwmicrosoftpdccom2009SVC09 and httpazurescopecloudappnet for more information

                                                            bull Avoid frequent large scans bull Parallelize queries bull Point queries are most efficient

                                                            bullTransactions across a single partition bullTransaction semantics amp Reduce round trips

                                                            Scalability

                                                            Query Efficiency amp Speed

                                                            Entity group transactions

                                                            Expect Continuation Tokens ndash Seriously

                                                            Maximum of 1000 rows in a response

                                                            At the end of partition range boundary

                                                            Maximum of 1000 rows in a response

                                                            At the end of partition range boundary

                                                            Maximum of 5 seconds to execute the query

                                                            Tables Recap bullEfficient for frequently used queries

                                                            bullSupports batch transactions

                                                            bullDistributes load

                                                            Select PartitionKey and RowKey that help scale

                                                            Avoid ldquoAppend onlyrdquo patterns

                                                            Always Handle continuation tokens

                                                            ldquoORrdquo predicates are not optimized

                                                            Implement back-off strategy for retries

                                                            bullDistribute by using a hash etc as prefix

                                                            bullExpect continuation tokens for range queries

                                                            bullExecute the queries that form the ldquoORrdquo predicates as separate queries

                                                            bullServer busy

                                                            bullLoad balance partitions to meet traffic needs

                                                            bullLoad on single partition has exceeded the limits

                                                            WCF Data Services

                                                            bullUse a new context for each logical operation

                                                            bullAddObjectAttachTo can throw exception if entity is already being tracked

                                                            bullPoint query throws an exception if resource does not exist Use IgnoreResourceNotFoundException

                                                            Queues Their Unique Role in Building Reliable Scalable Applications

                                                            bull Want roles that work closely together but are not bound together bull Tight coupling leads to brittleness

                                                            bull This can aid in scaling and performance

                                                            bull A queue can hold an unlimited number of messages bull Messages must be serializable as XML

                                                            bull Limited to 8KB in size

                                                            bull Commonly use the work ticket pattern

                                                            bull Why not simply use a table

                                                            Queue Terminology

                                                            Message Lifecycle

                                                            Queue

                                                            Msg 1

                                                            Msg 2

                                                            Msg 3

                                                            Msg 4

                                                            Worker Role

                                                            Worker Role

                                                            PutMessage

                                                            Web Role

                                                            GetMessage (Timeout) RemoveMessage

                                                            Msg 2 Msg 1

                                                            Worker Role

                                                            Msg 2

                                                            POST httpmyaccountqueuecorewindowsnetmyqueuemessages

                                                            HTTP11 200 OK Transfer-Encoding chunked Content-Type applicationxml Date Tue 09 Dec 2008 210430 GMT Server Nephos Queue Service Version 10 Microsoft-HTTPAPI20

                                                            ltxml version=10 encoding=utf-8gt ltQueueMessagesListgt ltQueueMessagegt ltMessageIdgt5974b586-0df3-4e2d-ad0c-18e3892bfca2ltMessageIdgt ltInsertionTimegtMon 22 Sep 2008 232920 GMTltInsertionTimegt ltExpirationTimegtMon 29 Sep 2008 232920 GMTltExpirationTimegt ltPopReceiptgtYzQ4Yzg1MDIGM0MDFiZDAwYzEwltPopReceiptgt ltTimeNextVisiblegtTue 23 Sep 2008 052920GMTltTimeNextVisiblegt ltMessageTextgtPHRlc3Q+dGdGVzdD4=ltMessageTextgt ltQueueMessagegt ltQueueMessagesListgt

                                                            DELETE httpmyaccountqueuecorewindowsnetmyqueuemessagesmessageidpopreceipt=YzQ4Yzg1MDIGM0MDFiZDAwYzEw

                                                            Truncated Exponential Back Off Polling

                                                            Consider a backoff polling approach Each empty poll

                                                            increases interval by 2x

                                                            A successful sets the interval back to 1

                                                            44

                                                            2 1

                                                            1 1

                                                            C1

                                                            C2

                                                            Removing Poison Messages

                                                            1 1

                                                            2 1

                                                            3 4 0

                                                            Producers Consumers

                                                            P2

                                                            P1

                                                            3 0

                                                            2 GetMessage(Q 30 s) msg 2

                                                            1 GetMessage(Q 30 s) msg 1

                                                            1 1

                                                            2 1

                                                            1 0

                                                            2 0

                                                            45

                                                            C1

                                                            C2

                                                            Removing Poison Messages

                                                            3 4 0

                                                            Producers Consumers

                                                            P2

                                                            P1

                                                            1 1

                                                            2 1

                                                            2 GetMessage(Q 30 s) msg 2 3 C2 consumed msg 2 4 DeleteMessage(Q msg 2) 7 GetMessage(Q 30 s) msg 1

                                                            1 GetMessage(Q 30 s) msg 1 5 C1 crashed

                                                            1 1

                                                            2 1

                                                            6 msg1 visible 30 s after Dequeue 3 0

                                                            1 2

                                                            1 1

                                                            1 2

                                                            46

                                                            C1

                                                            C2

                                                            Removing Poison Messages

                                                            3 4 0

                                                            Producers Consumers

                                                            P2

                                                            P1

                                                            1 2

                                                            2 Dequeue(Q 30 sec) msg 2 3 C2 consumed msg 2 4 Delete(Q msg 2) 7 Dequeue(Q 30 sec) msg 1 8 C2 crashed

                                                            1 Dequeue(Q 30 sec) msg 1 5 C1 crashed 10 C1 restarted 11 Dequeue(Q 30 sec) msg 1 12 DequeueCount gt 2 13 Delete (Q msg1) 1

                                                            2

                                                            6 msg1 visible 30s after Dequeue 9 msg1 visible 30s after Dequeue

                                                            3 0

                                                            1 3

                                                            1 2

                                                            1 3

                                                            Queues Recap

                                                            bullNo need to deal with failures Make message

                                                            processing idempotent

                                                            bullInvisible messages result in out of order Do not rely on order

                                                            bullEnforce threshold on messagersquos dequeue count Use Dequeue count to remove

                                                            poison messages

                                                            bullMessages gt 8KB

                                                            bullBatch messages

                                                            bullGarbage collect orphaned blobs

                                                            bullDynamically increasereduce workers

                                                            Use blob to store message data with

                                                            reference in message

                                                            Use message count to scale

                                                            bullNo need to deal with failures

                                                            bullInvisible messages result in out of order

                                                            bullEnforce threshold on messagersquos dequeue count

                                                            bullDynamically increasereduce workers

                                                            Windows Azure Storage Takeaways

                                                            Blobs

                                                            Drives

                                                            Tables

                                                            Queues

                                                            httpblogsmsdncomwindowsazurestorage

                                                            httpazurescopecloudappnet

                                                            49

                                                            A Quick Exercise

                                                            hellipThen letrsquos look at some code and some tools

                                                            50

                                                            Code ndash AccountInformationcs public class AccountInformation private static string storageKey = ldquotHiSiSnOtMyKeY private static string accountName = jjstore private static StorageCredentialsAccountAndKey credentials internal static StorageCredentialsAccountAndKey Credentials get if (credentials == null) credentials = new StorageCredentialsAccountAndKey(accountName storageKey) return credentials

                                                            51

                                                            Code ndash BlobHelpercs public class BlobHelper private static string defaultContainerName = school private CloudBlobClient client = null private CloudBlobContainer container = null private void InitContainer() if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudBlobClient() container = clientGetContainerReference(defaultContainerName) containerCreateIfNotExist() BlobContainerPermissions permissions = containerGetPermissions() permissionsPublicAccess = BlobContainerPublicAccessTypeContainer containerSetPermissions(permissions)

                                                            52

                                                            Code ndash BlobHelpercs

                                                            public void WriteFileToBlob(string filePath) if (client == null || container == null) InitContainer() FileInfo file = new FileInfo(filePath) CloudBlob blob = containerGetBlobReference(fileName) blobPropertiesContentType = GetContentType(fileExtension) blobUploadFile(fileFullName) Or if you want to write a string replace the last line with blobUploadText(someString) And make sure you set the content type to the appropriate MIME type (eg ldquotextplainrdquo)

                                                            53

                                                            Code ndash BlobHelpercs

                                                            public string GetBlobText(string blobName) if (client == null || container == null) InitContainer() CloudBlob blob = containerGetBlobReference(blobName) try return blobDownloadText() catch (Exception) The blob probably does not exist or there is no connection available return null

                                                            54

                                                            Application Code - Blobs private void SaveToCloudButton_Click(object sender RoutedEventArgs e) StringBuilder buff = new StringBuilder() buffAppendLine(LastNameFirstNameEmailBirthdayNativeLanguageFavoriteIceCreamYearsInPhDGraduated) foreach (AttendeeEntity attendee in attendees) buffAppendLine(attendeeToCsvString()) blobHelperWriteStringToBlob(SummerSchoolAttendeestxt buffToString())

                                                            The blob is now available at httpltAccountNamegtblobcorewindowsnetltContainerNamegtltBlobNamegt Or in this case httpjjstoreblobcorewindowsnetschoolSummerSchoolAttendeestxt

                                                            55

                                                            Code - TableEntities using MicrosoftWindowsAzureStorageClient public class AttendeeEntity TableServiceEntity public string FirstName get set public string LastName get set public string Email get set public DateTime Birthday get set public string FavoriteIceCream get set public int YearsInPhD get set public bool Graduated get set hellip

                                                            56

                                                            Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

                                                            57

                                                            Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

                                                            58

                                                            Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

                                                            59

                                                            Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

                                                            60

                                                            Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

                                                            Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

                                                            61

                                                            Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

                                                            62

                                                            Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

                                                            Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

                                                            63

                                                            Tools ndash Fiddler2

                                                            Best Practices

                                                            Picking the Right VM Size

                                                            bull Having the correct VM size can make a big difference in costs

                                                            bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                                                            bull If you scale better than linear across cores larger VMs could save you money

                                                            bull Pretty rare to see linear scaling across 8 cores

                                                            bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                                                            bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                                                            Using Your VM to the Maximum

                                                            Remember bull 1 role instance == 1 VM running Windows

                                                            bull 1 role instance = one specific task for your code

                                                            bull Yoursquore paying for the entire VM so why not use it

                                                            bull Common mistake ndash split up code into multiple roles each not using up CPU

                                                            bull Balance between using up CPU vs having free capacity in times of need

                                                            bull Multiple ways to use your CPU to the fullest

                                                            Exploiting Concurrency

                                                            bull Spin up additional processes each with a specific task or as a unit of concurrency

                                                            bull May not be ideal if number of active processes exceeds number of cores

                                                            bull Use multithreading aggressively

                                                            bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                                                            bull In NET 4 use the Task Parallel Library

                                                            bull Data parallelism

                                                            bull Task parallelism

                                                            Finding Good Code Neighbors

                                                            bull Typically code falls into one or more of these categories

                                                            bull Find code that is intensive with different resources to live together

                                                            bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                                                            Memory Intensive

                                                            CPU Intensive

                                                            Network IO Intensive

                                                            Storage IO Intensive

                                                            Scaling Appropriately

                                                            bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                                                            bull Spinning VMs up and down automatically is good at large scale

                                                            bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                                                            bull Being too aggressive in spinning down VMs can result in poor user experience

                                                            bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                                                            Performance Cost

                                                            Storage Costs

                                                            bull Understand an applicationrsquos storage profile and how storage billing works

                                                            bull Make service choices based on your app profile

                                                            bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                                                            bull Service choice can make a big cost difference based on your app profile

                                                            bull Caching and compressing They help a lot with storage costs

                                                            Saving Bandwidth Costs

                                                            Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                                            Sending fewer things over the wire often means getting fewer things from storage

                                                            Saving bandwidth costs often lead to savings in other places

                                                            Sending fewer things means your VM has time to do other tasks

                                                            All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                                            Compressing Content

                                                            1 Gzip all output content

                                                            bull All modern browsers can decompress on the fly

                                                            bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                                            2Tradeoff compute costs for storage size

                                                            3Minimize image sizes

                                                            bull Use Portable Network Graphics (PNGs)

                                                            bull Crush your PNGs

                                                            bull Strip needless metadata

                                                            bull Make all PNGs palette PNGs

                                                            Uncompressed

                                                            Content

                                                            Compressed

                                                            Content

                                                            Gzip

                                                            Minify JavaScript

                                                            Minify CCS

                                                            Minify Images

                                                            Best Practices Summary

                                                            Doing lsquolessrsquo is the key to saving costs

                                                            Measure everything

                                                            Know your application profile in and out

                                                            Research Examples in the Cloud

                                                            hellipon another set of slides

                                                            Map Reduce on Azure

                                                            bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                                            bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                                            76

                                                            Project Daytona - Map Reduce on Azure

                                                            httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                                            77

                                                            Questions and Discussionhellip

                                                            Thank you for hosting me at the Summer School

                                                            BLAST (Basic Local Alignment Search Tool)

                                                            bull The most important software in bioinformatics

                                                            bull Identify similarity between bio-sequences

                                                            Computationally intensive

                                                            bull Large number of pairwise alignment operations

                                                            bull A BLAST running can take 700 ~ 1000 CPU hours

                                                            bull Sequence databases growing exponentially

                                                            bull GenBank doubled in size in about 15 months

                                                            It is easy to parallelize BLAST

                                                            bull Segment the input

                                                            bull Segment processing (querying) is pleasingly parallel

                                                            bull Segment the database (eg mpiBLAST)

                                                            bull Needs special result reduction processing

                                                            Large volume data

                                                            bull A normal Blast database can be as large as 10GB

                                                            bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                            bull The output of BLAST is usually 10-100x larger than the input

                                                            bull Parallel BLAST engine on Azure

                                                            bull Query-segmentation data-parallel pattern bull split the input sequences

                                                            bull query partitions in parallel

                                                            bull merge results together when done

                                                            bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                            bull With three special considerations bull Batch job management

                                                            bull Task parallelism on an elastic Cloud

                                                            Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                            on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                            A simple SplitJoin pattern

                                                            Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                            bull 1248 for small middle large and extra large instance size

                                                            Task granularity bull Large partition load imbalance

                                                            bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                            bull Data transferring overhead

                                                            Best Practice test runs to profiling and set size to mitigate the overhead

                                                            Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                            bull too small repeated computation

                                                            bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                            bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                            bull Watch out for the 2-hour maximum limitation

                                                            BLAST task

                                                            Splitting task

                                                            BLAST task

                                                            BLAST task

                                                            BLAST task

                                                            hellip

                                                            Merging Task

                                                            Task size vs Performance

                                                            bull Benefit of the warm cache effect

                                                            bull 100 sequences per partition is the best choice

                                                            Instance size vs Performance

                                                            bull Super-linear speedup with larger size worker instances

                                                            bull Primarily due to the memory capability

                                                            Task SizeInstance Size vs Cost

                                                            bull Extra-large instance generated the best and the most economical throughput

                                                            bull Fully utilize the resource

                                                            Web

                                                            Portal

                                                            Web

                                                            Service

                                                            Job registration

                                                            Job Scheduler

                                                            Worker

                                                            Worker

                                                            Worker

                                                            Global

                                                            dispatch

                                                            queue

                                                            Web Role

                                                            Azure Table

                                                            Job Management Role

                                                            Azure Blob

                                                            Database

                                                            updating Role

                                                            hellip

                                                            Scaling Engine

                                                            Blast databases

                                                            temporary data etc)

                                                            Job Registry NCBI databases

                                                            BLAST task

                                                            Splitting task

                                                            BLAST task

                                                            BLAST task

                                                            BLAST task

                                                            hellip

                                                            Merging Task

                                                            ASPNET program hosted by a web role instance bull Submit jobs

                                                            bull Track jobrsquos status and logs

                                                            AuthenticationAuthorization based on Live ID

                                                            The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                            Web Portal

                                                            Web Service

                                                            Job registration

                                                            Job Scheduler

                                                            Job Portal

                                                            Scaling Engine

                                                            Job Registry

                                                            R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                            Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                            bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                            AzureBLAST significantly saved computing timehellip

                                                            Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                            ldquoAll against Allrdquo query bull The database is also the input query

                                                            bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                            bull Theoretically 100 billion sequence comparisons

                                                            Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                            bull Would require 3216731 minutes (61 years) on one desktop

                                                            One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                            bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                            bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                            bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                            bull Each segment consists of smaller partitions

                                                            bull When load imbalances redistribute the load manually

                                                            5

                                                            0

                                                            62 6

                                                            2 6

                                                            2 6

                                                            2 6

                                                            2 5

                                                            0 62

                                                            bull Total size of the output result is ~230GB

                                                            bull The number of total hits is 1764579487

                                                            bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                            bull But based our estimates real working instance time should be 6~8 day

                                                            bull Look into log data to analyze what took placehellip

                                                            5

                                                            0

                                                            62 6

                                                            2 6

                                                            2 6

                                                            2 6

                                                            2 5

                                                            0 62

                                                            A normal log record should be

                                                            Otherwise something is wrong (eg task failed to complete)

                                                            3312010 614 RD00155D3611B0 Executing the task 251523

                                                            3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                            3312010 625 RD00155D3611B0 Executing the task 251553

                                                            3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                            3312010 644 RD00155D3611B0 Executing the task 251600

                                                            3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                            3312010 822 RD00155D3611B0 Executing the task 251774

                                                            3312010 950 RD00155D3611B0 Executing the task 251895

                                                            3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                            North Europe Data Center totally 34256 tasks processed

                                                            All 62 compute nodes lost tasks

                                                            and then came back in a group

                                                            This is an Update domain

                                                            ~30 mins

                                                            ~ 6 nodes in one group

                                                            35 Nodes experience blob

                                                            writing failure at same

                                                            time

                                                            West Europe Datacenter 30976 tasks are completed and job was killed

                                                            A reasonable guess the

                                                            Fault Domain is working

                                                            MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                            You never miss the water till the well has run dry Irish Proverb

                                                            ET = Water volume evapotranspired (m3 s-1 m-2)

                                                            Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                            λv = Latent heat of vaporization (Jg)

                                                            Rn = Net radiation (W m-2)

                                                            cp = Specific heat capacity of air (J kg-1 K-1)

                                                            ρa = dry air density (kg m-3)

                                                            δq = vapor pressure deficit (Pa)

                                                            ga = Conductivity of air (inverse of ra) (m s-1)

                                                            gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                            γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                            Estimating resistanceconductivity across a

                                                            catchment can be tricky

                                                            bull Lots of inputs big data reduction

                                                            bull Some of the inputs are not so simple

                                                            119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                            (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                            Penman-Monteith (1964)

                                                            Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                            NASA MODIS

                                                            imagery source

                                                            archives

                                                            5 TB (600K files)

                                                            FLUXNET curated

                                                            sensor dataset

                                                            (30GB 960 files)

                                                            FLUXNET curated

                                                            field dataset

                                                            2 KB (1 file)

                                                            NCEPNCAR

                                                            ~100MB

                                                            (4K files)

                                                            Vegetative

                                                            clumping

                                                            ~5MB (1file)

                                                            Climate

                                                            classification

                                                            ~1MB (1file)

                                                            20 US year = 1 global year

                                                            Data collection (map) stage

                                                            bull Downloads requested input tiles

                                                            from NASA ftp sites

                                                            bull Includes geospatial lookup for

                                                            non-sinusoidal tiles that will

                                                            contribute to a reprojected

                                                            sinusoidal tile

                                                            Reprojection (map) stage

                                                            bull Converts source tile(s) to

                                                            intermediate result sinusoidal tiles

                                                            bull Simple nearest neighbor or spline

                                                            algorithms

                                                            Derivation reduction stage

                                                            bull First stage visible to scientist

                                                            bull Computes ET in our initial use

                                                            Analysis reduction stage

                                                            bull Optional second stage visible to

                                                            scientist

                                                            bull Enables production of science

                                                            analysis artifacts such as maps

                                                            tables virtual sensors

                                                            Reduction 1

                                                            Queue

                                                            Source

                                                            Metadata

                                                            AzureMODIS

                                                            Service Web Role Portal

                                                            Request

                                                            Queue

                                                            Scientific

                                                            Results

                                                            Download

                                                            Data Collection Stage

                                                            Source Imagery Download Sites

                                                            Reprojection

                                                            Queue

                                                            Reduction 2

                                                            Queue

                                                            Download

                                                            Queue

                                                            Scientists

                                                            Science results

                                                            Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                            httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                            bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                            bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                            bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                            recoverable units of work

                                                            bull Execution status of all jobs and tasks persisted in Tables

                                                            ltPipelineStagegt

                                                            Request

                                                            hellip ltPipelineStagegtJobStatus

                                                            Persist ltPipelineStagegtJob Queue

                                                            MODISAzure Service

                                                            (Web Role)

                                                            Service Monitor

                                                            (Worker Role)

                                                            Parse amp Persist ltPipelineStagegtTaskStatus

                                                            hellip

                                                            Dispatch

                                                            ltPipelineStagegtTask Queue

                                                            All work actually done by a Worker Role

                                                            bull Sandboxes science or other executable

                                                            bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                            Service Monitor

                                                            (Worker Role)

                                                            Parse amp Persist ltPipelineStagegtTaskStatus

                                                            GenericWorker

                                                            (Worker Role)

                                                            hellip

                                                            hellip

                                                            Dispatch

                                                            ltPipelineStagegtTask Queue

                                                            hellip

                                                            ltInputgtData Storage

                                                            bull Dequeues tasks created by the Service Monitor

                                                            bull Retries failed tasks 3 times

                                                            bull Maintains all task status

                                                            Reprojection Request

                                                            hellip

                                                            Service Monitor

                                                            (Worker Role)

                                                            ReprojectionJobStatus Persist

                                                            Parse amp Persist ReprojectionTaskStatus

                                                            GenericWorker

                                                            (Worker Role)

                                                            hellip

                                                            Job Queue

                                                            hellip

                                                            Dispatch

                                                            Task Queue

                                                            Points to

                                                            hellip

                                                            ScanTimeList

                                                            SwathGranuleMeta

                                                            Reprojection Data

                                                            Storage

                                                            Each entity specifies a

                                                            single reprojection job

                                                            request

                                                            Each entity specifies a

                                                            single reprojection task (ie

                                                            a single tile)

                                                            Query this table to get

                                                            geo-metadata (eg

                                                            boundaries) for each swath

                                                            tile

                                                            Query this table to get the

                                                            list of satellite scan times

                                                            that cover a target tile

                                                            Swath Source

                                                            Data Storage

                                                            bull Computational costs driven by data scale and need to run reduction multiple times

                                                            bull Storage costs driven by data scale and 6 month project duration

                                                            bull Small with respect to the people costs even at graduate student rates

                                                            Reduction 1 Queue

                                                            Source Metadata

                                                            Request Queue

                                                            Scientific Results Download

                                                            Data Collection Stage

                                                            Source Imagery Download Sites

                                                            Reprojection Queue

                                                            Reduction 2 Queue

                                                            Download

                                                            Queue

                                                            Scientists

                                                            Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                            400-500 GB

                                                            60K files

                                                            10 MBsec

                                                            11 hours

                                                            lt10 workers

                                                            $50 upload

                                                            $450 storage

                                                            400 GB

                                                            45K files

                                                            3500 hours

                                                            20-100

                                                            workers

                                                            5-7 GB

                                                            55K files

                                                            1800 hours

                                                            20-100

                                                            workers

                                                            lt10 GB

                                                            ~1K files

                                                            1800 hours

                                                            20-100

                                                            workers

                                                            $420 cpu

                                                            $60 download

                                                            $216 cpu

                                                            $1 download

                                                            $6 storage

                                                            $216 cpu

                                                            $2 download

                                                            $9 storage

                                                            AzureMODIS

                                                            Service Web Role Portal

                                                            Total $1420

                                                            bull Clouds are the largest scale computer centers ever constructed and have the

                                                            potential to be important to both large and small scale science problems

                                                            bull Equally import they can increase participation in research providing needed

                                                            resources to userscommunities without ready access

                                                            bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                            support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                            latency applications do not perform optimally on clouds today

                                                            bull Provide valuable fault tolerance and scalability abstractions

                                                            bull Clouds as amplifier for familiar client tools and on premise compute

                                                            bull Clouds services to support research provide considerable leverage for both

                                                            individual researchers and entire communities of researchers

                                                            • Day 2 - Azure as a PaaS
                                                            • Day 2 - Applications

                                                              Windows Azure Tables

                                                              bull Provides Structured Storage

                                                              bull Massively Scalable Tables bull Billions of entities (rows) and TBs of data

                                                              bull Can use thousands of servers as traffic grows

                                                              bull Highly Available amp Durable bull Data is replicated several times

                                                              bull Familiar and Easy to use API

                                                              bull WCF Data Services and OData bull NET classes and LINQ

                                                              bull REST ndash with any platform or language

                                                              Windows Azure Queues

                                                              bull Queue are performance efficient highly available and provide reliable message delivery

                                                              bull Simple asynchronous work dispatch

                                                              bull Programming semantics ensure that a message can be processed at least once

                                                              bull Access is provided via REST

                                                              Storage Partitioning

                                                              Understanding partitioning is key to understanding performance

                                                              bull Different for each data type (blobs entities queues) Every data object has a partition key

                                                              bull A partition can be served by a single server

                                                              bull System load balances partitions based on traffic pattern

                                                              bull Controls entity locality Partition key is unit of scale

                                                              bull Load balancing can take a few minutes to kick in

                                                              bull Can take a couple of seconds for partition to be available on a different server

                                                              System load balances

                                                              bull Use exponential backoff on ldquoServer Busyrdquo

                                                              bull Our system load balances to meet your traffic needs

                                                              bull Single partition limits have been reached Server Busy

                                                              Partition Keys In Each Abstraction

                                                              bull Entities w same PartitionKey value served from same partition Entities ndash TableName + PartitionKey

                                                              PartitionKey (CustomerId) RowKey (RowKind)

                                                              Name CreditCardNumber OrderTotal

                                                              1 Customer-John Smith John Smith xxxx-xxxx-xxxx-xxxx

                                                              1 Order ndash 1 $3512

                                                              2 Customer-Bill Johnson Bill Johnson xxxx-xxxx-xxxx-xxxx

                                                              2 Order ndash 3 $1000

                                                              bull Every blob and its snapshots are in a single partition Blobs ndash Container name + Blob name

                                                              bullAll messages for a single queue belong to the same partition Messages ndash Queue Name

                                                              Container Name Blob Name

                                                              image annarborbighousejpg

                                                              image foxboroughgillettejpg

                                                              video annarborbighousejpg

                                                              Queue Message

                                                              jobs Message1

                                                              jobs Message2

                                                              workflow Message1

                                                              Scalability Targets

                                                              Storage Account

                                                              bull Capacity ndash Up to 100 TBs

                                                              bull Transactions ndash Up to a few thousand requests per second

                                                              bull Bandwidth ndash Up to a few hundred megabytes per second

                                                              Single QueueTable Partition

                                                              bull Up to 500 transactions per second

                                                              To go above these numbers partition between multiple storage accounts and partitions

                                                              When limit is hit app will see lsquo503 server busyrsquo applications should implement exponential backoff

                                                              Single Blob Partition

                                                              bull Throughput up to 60 MBs

                                                              PartitionKey (Category)

                                                              RowKey (Title)

                                                              Timestamp ReleaseDate

                                                              Action Fast amp Furious hellip 2009

                                                              Action The Bourne Ultimatum hellip 2007

                                                              hellip hellip hellip hellip

                                                              Animation Open Season 2 hellip 2009

                                                              Animation The Ant Bully hellip 2006

                                                              PartitionKey (Category)

                                                              RowKey (Title)

                                                              Timestamp ReleaseDate

                                                              Comedy Office Space hellip 1999

                                                              hellip hellip hellip hellip

                                                              SciFi X-Men Origins Wolverine hellip 2009

                                                              hellip hellip hellip hellip

                                                              War Defiance hellip 2008

                                                              PartitionKey (Category)

                                                              RowKey (Title)

                                                              Timestamp ReleaseDate

                                                              Action Fast amp Furious hellip 2009

                                                              Action The Bourne Ultimatum hellip 2007

                                                              hellip hellip hellip hellip

                                                              Animation Open Season 2 hellip 2009

                                                              Animation The Ant Bully hellip 2006

                                                              hellip hellip hellip hellip

                                                              Comedy Office Space hellip 1999

                                                              hellip hellip hellip hellip

                                                              SciFi X-Men Origins Wolverine hellip 2009

                                                              hellip hellip hellip hellip

                                                              War Defiance hellip 2008

                                                              Partitions and Partition Ranges

                                                              Key Selection Things to Consider

                                                              bullDistribute load as much as possible bullHot partitions can be load balanced bullPartitionKey is critical for scalability

                                                              See httpwwwmicrosoftpdccom2009SVC09 and httpazurescopecloudappnet for more information

                                                              bull Avoid frequent large scans bull Parallelize queries bull Point queries are most efficient

                                                              bullTransactions across a single partition bullTransaction semantics amp Reduce round trips

                                                              Scalability

                                                              Query Efficiency amp Speed

                                                              Entity group transactions

                                                              Expect Continuation Tokens ndash Seriously

                                                              Maximum of 1000 rows in a response

                                                              At the end of partition range boundary

                                                              Maximum of 1000 rows in a response

                                                              At the end of partition range boundary

                                                              Maximum of 5 seconds to execute the query

                                                              Tables Recap bullEfficient for frequently used queries

                                                              bullSupports batch transactions

                                                              bullDistributes load

                                                              Select PartitionKey and RowKey that help scale

                                                              Avoid ldquoAppend onlyrdquo patterns

                                                              Always Handle continuation tokens

                                                              ldquoORrdquo predicates are not optimized

                                                              Implement back-off strategy for retries

                                                              bullDistribute by using a hash etc as prefix

                                                              bullExpect continuation tokens for range queries

                                                              bullExecute the queries that form the ldquoORrdquo predicates as separate queries

                                                              bullServer busy

                                                              bullLoad balance partitions to meet traffic needs

                                                              bullLoad on single partition has exceeded the limits

                                                              WCF Data Services

                                                              bullUse a new context for each logical operation

                                                              bullAddObjectAttachTo can throw exception if entity is already being tracked

                                                              bullPoint query throws an exception if resource does not exist Use IgnoreResourceNotFoundException

                                                              Queues Their Unique Role in Building Reliable Scalable Applications

                                                              bull Want roles that work closely together but are not bound together bull Tight coupling leads to brittleness

                                                              bull This can aid in scaling and performance

                                                              bull A queue can hold an unlimited number of messages bull Messages must be serializable as XML

                                                              bull Limited to 8KB in size

                                                              bull Commonly use the work ticket pattern

                                                              bull Why not simply use a table

                                                              Queue Terminology

                                                              Message Lifecycle

                                                              Queue

                                                              Msg 1

                                                              Msg 2

                                                              Msg 3

                                                              Msg 4

                                                              Worker Role

                                                              Worker Role

                                                              PutMessage

                                                              Web Role

                                                              GetMessage (Timeout) RemoveMessage

                                                              Msg 2 Msg 1

                                                              Worker Role

                                                              Msg 2

                                                              POST httpmyaccountqueuecorewindowsnetmyqueuemessages

                                                              HTTP11 200 OK Transfer-Encoding chunked Content-Type applicationxml Date Tue 09 Dec 2008 210430 GMT Server Nephos Queue Service Version 10 Microsoft-HTTPAPI20

                                                              ltxml version=10 encoding=utf-8gt ltQueueMessagesListgt ltQueueMessagegt ltMessageIdgt5974b586-0df3-4e2d-ad0c-18e3892bfca2ltMessageIdgt ltInsertionTimegtMon 22 Sep 2008 232920 GMTltInsertionTimegt ltExpirationTimegtMon 29 Sep 2008 232920 GMTltExpirationTimegt ltPopReceiptgtYzQ4Yzg1MDIGM0MDFiZDAwYzEwltPopReceiptgt ltTimeNextVisiblegtTue 23 Sep 2008 052920GMTltTimeNextVisiblegt ltMessageTextgtPHRlc3Q+dGdGVzdD4=ltMessageTextgt ltQueueMessagegt ltQueueMessagesListgt

                                                              DELETE httpmyaccountqueuecorewindowsnetmyqueuemessagesmessageidpopreceipt=YzQ4Yzg1MDIGM0MDFiZDAwYzEw

                                                              Truncated Exponential Back Off Polling

                                                              Consider a backoff polling approach Each empty poll

                                                              increases interval by 2x

                                                              A successful sets the interval back to 1

                                                              44

                                                              2 1

                                                              1 1

                                                              C1

                                                              C2

                                                              Removing Poison Messages

                                                              1 1

                                                              2 1

                                                              3 4 0

                                                              Producers Consumers

                                                              P2

                                                              P1

                                                              3 0

                                                              2 GetMessage(Q 30 s) msg 2

                                                              1 GetMessage(Q 30 s) msg 1

                                                              1 1

                                                              2 1

                                                              1 0

                                                              2 0

                                                              45

                                                              C1

                                                              C2

                                                              Removing Poison Messages

                                                              3 4 0

                                                              Producers Consumers

                                                              P2

                                                              P1

                                                              1 1

                                                              2 1

                                                              2 GetMessage(Q 30 s) msg 2 3 C2 consumed msg 2 4 DeleteMessage(Q msg 2) 7 GetMessage(Q 30 s) msg 1

                                                              1 GetMessage(Q 30 s) msg 1 5 C1 crashed

                                                              1 1

                                                              2 1

                                                              6 msg1 visible 30 s after Dequeue 3 0

                                                              1 2

                                                              1 1

                                                              1 2

                                                              46

                                                              C1

                                                              C2

                                                              Removing Poison Messages

                                                              3 4 0

                                                              Producers Consumers

                                                              P2

                                                              P1

                                                              1 2

                                                              2 Dequeue(Q 30 sec) msg 2 3 C2 consumed msg 2 4 Delete(Q msg 2) 7 Dequeue(Q 30 sec) msg 1 8 C2 crashed

                                                              1 Dequeue(Q 30 sec) msg 1 5 C1 crashed 10 C1 restarted 11 Dequeue(Q 30 sec) msg 1 12 DequeueCount gt 2 13 Delete (Q msg1) 1

                                                              2

                                                              6 msg1 visible 30s after Dequeue 9 msg1 visible 30s after Dequeue

                                                              3 0

                                                              1 3

                                                              1 2

                                                              1 3

                                                              Queues Recap

                                                              bullNo need to deal with failures Make message

                                                              processing idempotent

                                                              bullInvisible messages result in out of order Do not rely on order

                                                              bullEnforce threshold on messagersquos dequeue count Use Dequeue count to remove

                                                              poison messages

                                                              bullMessages gt 8KB

                                                              bullBatch messages

                                                              bullGarbage collect orphaned blobs

                                                              bullDynamically increasereduce workers

                                                              Use blob to store message data with

                                                              reference in message

                                                              Use message count to scale

                                                              bullNo need to deal with failures

                                                              bullInvisible messages result in out of order

                                                              bullEnforce threshold on messagersquos dequeue count

                                                              bullDynamically increasereduce workers

                                                              Windows Azure Storage Takeaways

                                                              Blobs

                                                              Drives

                                                              Tables

                                                              Queues

                                                              httpblogsmsdncomwindowsazurestorage

                                                              httpazurescopecloudappnet

                                                              49

                                                              A Quick Exercise

                                                              hellipThen letrsquos look at some code and some tools

                                                              50

                                                              Code ndash AccountInformationcs public class AccountInformation private static string storageKey = ldquotHiSiSnOtMyKeY private static string accountName = jjstore private static StorageCredentialsAccountAndKey credentials internal static StorageCredentialsAccountAndKey Credentials get if (credentials == null) credentials = new StorageCredentialsAccountAndKey(accountName storageKey) return credentials

                                                              51

                                                              Code ndash BlobHelpercs public class BlobHelper private static string defaultContainerName = school private CloudBlobClient client = null private CloudBlobContainer container = null private void InitContainer() if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudBlobClient() container = clientGetContainerReference(defaultContainerName) containerCreateIfNotExist() BlobContainerPermissions permissions = containerGetPermissions() permissionsPublicAccess = BlobContainerPublicAccessTypeContainer containerSetPermissions(permissions)

                                                              52

                                                              Code ndash BlobHelpercs

                                                              public void WriteFileToBlob(string filePath) if (client == null || container == null) InitContainer() FileInfo file = new FileInfo(filePath) CloudBlob blob = containerGetBlobReference(fileName) blobPropertiesContentType = GetContentType(fileExtension) blobUploadFile(fileFullName) Or if you want to write a string replace the last line with blobUploadText(someString) And make sure you set the content type to the appropriate MIME type (eg ldquotextplainrdquo)

                                                              53

                                                              Code ndash BlobHelpercs

                                                              public string GetBlobText(string blobName) if (client == null || container == null) InitContainer() CloudBlob blob = containerGetBlobReference(blobName) try return blobDownloadText() catch (Exception) The blob probably does not exist or there is no connection available return null

                                                              54

                                                              Application Code - Blobs private void SaveToCloudButton_Click(object sender RoutedEventArgs e) StringBuilder buff = new StringBuilder() buffAppendLine(LastNameFirstNameEmailBirthdayNativeLanguageFavoriteIceCreamYearsInPhDGraduated) foreach (AttendeeEntity attendee in attendees) buffAppendLine(attendeeToCsvString()) blobHelperWriteStringToBlob(SummerSchoolAttendeestxt buffToString())

                                                              The blob is now available at httpltAccountNamegtblobcorewindowsnetltContainerNamegtltBlobNamegt Or in this case httpjjstoreblobcorewindowsnetschoolSummerSchoolAttendeestxt

                                                              55

                                                              Code - TableEntities using MicrosoftWindowsAzureStorageClient public class AttendeeEntity TableServiceEntity public string FirstName get set public string LastName get set public string Email get set public DateTime Birthday get set public string FavoriteIceCream get set public int YearsInPhD get set public bool Graduated get set hellip

                                                              56

                                                              Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

                                                              57

                                                              Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

                                                              58

                                                              Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

                                                              59

                                                              Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

                                                              60

                                                              Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

                                                              Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

                                                              61

                                                              Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

                                                              62

                                                              Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

                                                              Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

                                                              63

                                                              Tools ndash Fiddler2

                                                              Best Practices

                                                              Picking the Right VM Size

                                                              bull Having the correct VM size can make a big difference in costs

                                                              bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                                                              bull If you scale better than linear across cores larger VMs could save you money

                                                              bull Pretty rare to see linear scaling across 8 cores

                                                              bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                                                              bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                                                              Using Your VM to the Maximum

                                                              Remember bull 1 role instance == 1 VM running Windows

                                                              bull 1 role instance = one specific task for your code

                                                              bull Yoursquore paying for the entire VM so why not use it

                                                              bull Common mistake ndash split up code into multiple roles each not using up CPU

                                                              bull Balance between using up CPU vs having free capacity in times of need

                                                              bull Multiple ways to use your CPU to the fullest

                                                              Exploiting Concurrency

                                                              bull Spin up additional processes each with a specific task or as a unit of concurrency

                                                              bull May not be ideal if number of active processes exceeds number of cores

                                                              bull Use multithreading aggressively

                                                              bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                                                              bull In NET 4 use the Task Parallel Library

                                                              bull Data parallelism

                                                              bull Task parallelism

                                                              Finding Good Code Neighbors

                                                              bull Typically code falls into one or more of these categories

                                                              bull Find code that is intensive with different resources to live together

                                                              bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                                                              Memory Intensive

                                                              CPU Intensive

                                                              Network IO Intensive

                                                              Storage IO Intensive

                                                              Scaling Appropriately

                                                              bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                                                              bull Spinning VMs up and down automatically is good at large scale

                                                              bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                                                              bull Being too aggressive in spinning down VMs can result in poor user experience

                                                              bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                                                              Performance Cost

                                                              Storage Costs

                                                              bull Understand an applicationrsquos storage profile and how storage billing works

                                                              bull Make service choices based on your app profile

                                                              bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                                                              bull Service choice can make a big cost difference based on your app profile

                                                              bull Caching and compressing They help a lot with storage costs

                                                              Saving Bandwidth Costs

                                                              Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                                              Sending fewer things over the wire often means getting fewer things from storage

                                                              Saving bandwidth costs often lead to savings in other places

                                                              Sending fewer things means your VM has time to do other tasks

                                                              All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                                              Compressing Content

                                                              1 Gzip all output content

                                                              bull All modern browsers can decompress on the fly

                                                              bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                                              2Tradeoff compute costs for storage size

                                                              3Minimize image sizes

                                                              bull Use Portable Network Graphics (PNGs)

                                                              bull Crush your PNGs

                                                              bull Strip needless metadata

                                                              bull Make all PNGs palette PNGs

                                                              Uncompressed

                                                              Content

                                                              Compressed

                                                              Content

                                                              Gzip

                                                              Minify JavaScript

                                                              Minify CCS

                                                              Minify Images

                                                              Best Practices Summary

                                                              Doing lsquolessrsquo is the key to saving costs

                                                              Measure everything

                                                              Know your application profile in and out

                                                              Research Examples in the Cloud

                                                              hellipon another set of slides

                                                              Map Reduce on Azure

                                                              bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                                              bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                                              76

                                                              Project Daytona - Map Reduce on Azure

                                                              httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                                              77

                                                              Questions and Discussionhellip

                                                              Thank you for hosting me at the Summer School

                                                              BLAST (Basic Local Alignment Search Tool)

                                                              bull The most important software in bioinformatics

                                                              bull Identify similarity between bio-sequences

                                                              Computationally intensive

                                                              bull Large number of pairwise alignment operations

                                                              bull A BLAST running can take 700 ~ 1000 CPU hours

                                                              bull Sequence databases growing exponentially

                                                              bull GenBank doubled in size in about 15 months

                                                              It is easy to parallelize BLAST

                                                              bull Segment the input

                                                              bull Segment processing (querying) is pleasingly parallel

                                                              bull Segment the database (eg mpiBLAST)

                                                              bull Needs special result reduction processing

                                                              Large volume data

                                                              bull A normal Blast database can be as large as 10GB

                                                              bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                              bull The output of BLAST is usually 10-100x larger than the input

                                                              bull Parallel BLAST engine on Azure

                                                              bull Query-segmentation data-parallel pattern bull split the input sequences

                                                              bull query partitions in parallel

                                                              bull merge results together when done

                                                              bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                              bull With three special considerations bull Batch job management

                                                              bull Task parallelism on an elastic Cloud

                                                              Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                              on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                              A simple SplitJoin pattern

                                                              Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                              bull 1248 for small middle large and extra large instance size

                                                              Task granularity bull Large partition load imbalance

                                                              bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                              bull Data transferring overhead

                                                              Best Practice test runs to profiling and set size to mitigate the overhead

                                                              Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                              bull too small repeated computation

                                                              bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                              bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                              bull Watch out for the 2-hour maximum limitation

                                                              BLAST task

                                                              Splitting task

                                                              BLAST task

                                                              BLAST task

                                                              BLAST task

                                                              hellip

                                                              Merging Task

                                                              Task size vs Performance

                                                              bull Benefit of the warm cache effect

                                                              bull 100 sequences per partition is the best choice

                                                              Instance size vs Performance

                                                              bull Super-linear speedup with larger size worker instances

                                                              bull Primarily due to the memory capability

                                                              Task SizeInstance Size vs Cost

                                                              bull Extra-large instance generated the best and the most economical throughput

                                                              bull Fully utilize the resource

                                                              Web

                                                              Portal

                                                              Web

                                                              Service

                                                              Job registration

                                                              Job Scheduler

                                                              Worker

                                                              Worker

                                                              Worker

                                                              Global

                                                              dispatch

                                                              queue

                                                              Web Role

                                                              Azure Table

                                                              Job Management Role

                                                              Azure Blob

                                                              Database

                                                              updating Role

                                                              hellip

                                                              Scaling Engine

                                                              Blast databases

                                                              temporary data etc)

                                                              Job Registry NCBI databases

                                                              BLAST task

                                                              Splitting task

                                                              BLAST task

                                                              BLAST task

                                                              BLAST task

                                                              hellip

                                                              Merging Task

                                                              ASPNET program hosted by a web role instance bull Submit jobs

                                                              bull Track jobrsquos status and logs

                                                              AuthenticationAuthorization based on Live ID

                                                              The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                              Web Portal

                                                              Web Service

                                                              Job registration

                                                              Job Scheduler

                                                              Job Portal

                                                              Scaling Engine

                                                              Job Registry

                                                              R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                              Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                              bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                              AzureBLAST significantly saved computing timehellip

                                                              Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                              ldquoAll against Allrdquo query bull The database is also the input query

                                                              bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                              bull Theoretically 100 billion sequence comparisons

                                                              Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                              bull Would require 3216731 minutes (61 years) on one desktop

                                                              One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                              bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                              bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                              bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                              bull Each segment consists of smaller partitions

                                                              bull When load imbalances redistribute the load manually

                                                              5

                                                              0

                                                              62 6

                                                              2 6

                                                              2 6

                                                              2 6

                                                              2 5

                                                              0 62

                                                              bull Total size of the output result is ~230GB

                                                              bull The number of total hits is 1764579487

                                                              bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                              bull But based our estimates real working instance time should be 6~8 day

                                                              bull Look into log data to analyze what took placehellip

                                                              5

                                                              0

                                                              62 6

                                                              2 6

                                                              2 6

                                                              2 6

                                                              2 5

                                                              0 62

                                                              A normal log record should be

                                                              Otherwise something is wrong (eg task failed to complete)

                                                              3312010 614 RD00155D3611B0 Executing the task 251523

                                                              3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                              3312010 625 RD00155D3611B0 Executing the task 251553

                                                              3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                              3312010 644 RD00155D3611B0 Executing the task 251600

                                                              3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                              3312010 822 RD00155D3611B0 Executing the task 251774

                                                              3312010 950 RD00155D3611B0 Executing the task 251895

                                                              3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                              North Europe Data Center totally 34256 tasks processed

                                                              All 62 compute nodes lost tasks

                                                              and then came back in a group

                                                              This is an Update domain

                                                              ~30 mins

                                                              ~ 6 nodes in one group

                                                              35 Nodes experience blob

                                                              writing failure at same

                                                              time

                                                              West Europe Datacenter 30976 tasks are completed and job was killed

                                                              A reasonable guess the

                                                              Fault Domain is working

                                                              MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                              You never miss the water till the well has run dry Irish Proverb

                                                              ET = Water volume evapotranspired (m3 s-1 m-2)

                                                              Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                              λv = Latent heat of vaporization (Jg)

                                                              Rn = Net radiation (W m-2)

                                                              cp = Specific heat capacity of air (J kg-1 K-1)

                                                              ρa = dry air density (kg m-3)

                                                              δq = vapor pressure deficit (Pa)

                                                              ga = Conductivity of air (inverse of ra) (m s-1)

                                                              gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                              γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                              Estimating resistanceconductivity across a

                                                              catchment can be tricky

                                                              bull Lots of inputs big data reduction

                                                              bull Some of the inputs are not so simple

                                                              119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                              (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                              Penman-Monteith (1964)

                                                              Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                              NASA MODIS

                                                              imagery source

                                                              archives

                                                              5 TB (600K files)

                                                              FLUXNET curated

                                                              sensor dataset

                                                              (30GB 960 files)

                                                              FLUXNET curated

                                                              field dataset

                                                              2 KB (1 file)

                                                              NCEPNCAR

                                                              ~100MB

                                                              (4K files)

                                                              Vegetative

                                                              clumping

                                                              ~5MB (1file)

                                                              Climate

                                                              classification

                                                              ~1MB (1file)

                                                              20 US year = 1 global year

                                                              Data collection (map) stage

                                                              bull Downloads requested input tiles

                                                              from NASA ftp sites

                                                              bull Includes geospatial lookup for

                                                              non-sinusoidal tiles that will

                                                              contribute to a reprojected

                                                              sinusoidal tile

                                                              Reprojection (map) stage

                                                              bull Converts source tile(s) to

                                                              intermediate result sinusoidal tiles

                                                              bull Simple nearest neighbor or spline

                                                              algorithms

                                                              Derivation reduction stage

                                                              bull First stage visible to scientist

                                                              bull Computes ET in our initial use

                                                              Analysis reduction stage

                                                              bull Optional second stage visible to

                                                              scientist

                                                              bull Enables production of science

                                                              analysis artifacts such as maps

                                                              tables virtual sensors

                                                              Reduction 1

                                                              Queue

                                                              Source

                                                              Metadata

                                                              AzureMODIS

                                                              Service Web Role Portal

                                                              Request

                                                              Queue

                                                              Scientific

                                                              Results

                                                              Download

                                                              Data Collection Stage

                                                              Source Imagery Download Sites

                                                              Reprojection

                                                              Queue

                                                              Reduction 2

                                                              Queue

                                                              Download

                                                              Queue

                                                              Scientists

                                                              Science results

                                                              Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                              httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                              bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                              bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                              bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                              recoverable units of work

                                                              bull Execution status of all jobs and tasks persisted in Tables

                                                              ltPipelineStagegt

                                                              Request

                                                              hellip ltPipelineStagegtJobStatus

                                                              Persist ltPipelineStagegtJob Queue

                                                              MODISAzure Service

                                                              (Web Role)

                                                              Service Monitor

                                                              (Worker Role)

                                                              Parse amp Persist ltPipelineStagegtTaskStatus

                                                              hellip

                                                              Dispatch

                                                              ltPipelineStagegtTask Queue

                                                              All work actually done by a Worker Role

                                                              bull Sandboxes science or other executable

                                                              bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                              Service Monitor

                                                              (Worker Role)

                                                              Parse amp Persist ltPipelineStagegtTaskStatus

                                                              GenericWorker

                                                              (Worker Role)

                                                              hellip

                                                              hellip

                                                              Dispatch

                                                              ltPipelineStagegtTask Queue

                                                              hellip

                                                              ltInputgtData Storage

                                                              bull Dequeues tasks created by the Service Monitor

                                                              bull Retries failed tasks 3 times

                                                              bull Maintains all task status

                                                              Reprojection Request

                                                              hellip

                                                              Service Monitor

                                                              (Worker Role)

                                                              ReprojectionJobStatus Persist

                                                              Parse amp Persist ReprojectionTaskStatus

                                                              GenericWorker

                                                              (Worker Role)

                                                              hellip

                                                              Job Queue

                                                              hellip

                                                              Dispatch

                                                              Task Queue

                                                              Points to

                                                              hellip

                                                              ScanTimeList

                                                              SwathGranuleMeta

                                                              Reprojection Data

                                                              Storage

                                                              Each entity specifies a

                                                              single reprojection job

                                                              request

                                                              Each entity specifies a

                                                              single reprojection task (ie

                                                              a single tile)

                                                              Query this table to get

                                                              geo-metadata (eg

                                                              boundaries) for each swath

                                                              tile

                                                              Query this table to get the

                                                              list of satellite scan times

                                                              that cover a target tile

                                                              Swath Source

                                                              Data Storage

                                                              bull Computational costs driven by data scale and need to run reduction multiple times

                                                              bull Storage costs driven by data scale and 6 month project duration

                                                              bull Small with respect to the people costs even at graduate student rates

                                                              Reduction 1 Queue

                                                              Source Metadata

                                                              Request Queue

                                                              Scientific Results Download

                                                              Data Collection Stage

                                                              Source Imagery Download Sites

                                                              Reprojection Queue

                                                              Reduction 2 Queue

                                                              Download

                                                              Queue

                                                              Scientists

                                                              Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                              400-500 GB

                                                              60K files

                                                              10 MBsec

                                                              11 hours

                                                              lt10 workers

                                                              $50 upload

                                                              $450 storage

                                                              400 GB

                                                              45K files

                                                              3500 hours

                                                              20-100

                                                              workers

                                                              5-7 GB

                                                              55K files

                                                              1800 hours

                                                              20-100

                                                              workers

                                                              lt10 GB

                                                              ~1K files

                                                              1800 hours

                                                              20-100

                                                              workers

                                                              $420 cpu

                                                              $60 download

                                                              $216 cpu

                                                              $1 download

                                                              $6 storage

                                                              $216 cpu

                                                              $2 download

                                                              $9 storage

                                                              AzureMODIS

                                                              Service Web Role Portal

                                                              Total $1420

                                                              bull Clouds are the largest scale computer centers ever constructed and have the

                                                              potential to be important to both large and small scale science problems

                                                              bull Equally import they can increase participation in research providing needed

                                                              resources to userscommunities without ready access

                                                              bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                              support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                              latency applications do not perform optimally on clouds today

                                                              bull Provide valuable fault tolerance and scalability abstractions

                                                              bull Clouds as amplifier for familiar client tools and on premise compute

                                                              bull Clouds services to support research provide considerable leverage for both

                                                              individual researchers and entire communities of researchers

                                                              • Day 2 - Azure as a PaaS
                                                              • Day 2 - Applications

                                                                Windows Azure Queues

                                                                bull Queue are performance efficient highly available and provide reliable message delivery

                                                                bull Simple asynchronous work dispatch

                                                                bull Programming semantics ensure that a message can be processed at least once

                                                                bull Access is provided via REST

                                                                Storage Partitioning

                                                                Understanding partitioning is key to understanding performance

                                                                bull Different for each data type (blobs entities queues) Every data object has a partition key

                                                                bull A partition can be served by a single server

                                                                bull System load balances partitions based on traffic pattern

                                                                bull Controls entity locality Partition key is unit of scale

                                                                bull Load balancing can take a few minutes to kick in

                                                                bull Can take a couple of seconds for partition to be available on a different server

                                                                System load balances

                                                                bull Use exponential backoff on ldquoServer Busyrdquo

                                                                bull Our system load balances to meet your traffic needs

                                                                bull Single partition limits have been reached Server Busy

                                                                Partition Keys In Each Abstraction

                                                                bull Entities w same PartitionKey value served from same partition Entities ndash TableName + PartitionKey

                                                                PartitionKey (CustomerId) RowKey (RowKind)

                                                                Name CreditCardNumber OrderTotal

                                                                1 Customer-John Smith John Smith xxxx-xxxx-xxxx-xxxx

                                                                1 Order ndash 1 $3512

                                                                2 Customer-Bill Johnson Bill Johnson xxxx-xxxx-xxxx-xxxx

                                                                2 Order ndash 3 $1000

                                                                bull Every blob and its snapshots are in a single partition Blobs ndash Container name + Blob name

                                                                bullAll messages for a single queue belong to the same partition Messages ndash Queue Name

                                                                Container Name Blob Name

                                                                image annarborbighousejpg

                                                                image foxboroughgillettejpg

                                                                video annarborbighousejpg

                                                                Queue Message

                                                                jobs Message1

                                                                jobs Message2

                                                                workflow Message1

                                                                Scalability Targets

                                                                Storage Account

                                                                bull Capacity ndash Up to 100 TBs

                                                                bull Transactions ndash Up to a few thousand requests per second

                                                                bull Bandwidth ndash Up to a few hundred megabytes per second

                                                                Single QueueTable Partition

                                                                bull Up to 500 transactions per second

                                                                To go above these numbers partition between multiple storage accounts and partitions

                                                                When limit is hit app will see lsquo503 server busyrsquo applications should implement exponential backoff

                                                                Single Blob Partition

                                                                bull Throughput up to 60 MBs

                                                                PartitionKey (Category)

                                                                RowKey (Title)

                                                                Timestamp ReleaseDate

                                                                Action Fast amp Furious hellip 2009

                                                                Action The Bourne Ultimatum hellip 2007

                                                                hellip hellip hellip hellip

                                                                Animation Open Season 2 hellip 2009

                                                                Animation The Ant Bully hellip 2006

                                                                PartitionKey (Category)

                                                                RowKey (Title)

                                                                Timestamp ReleaseDate

                                                                Comedy Office Space hellip 1999

                                                                hellip hellip hellip hellip

                                                                SciFi X-Men Origins Wolverine hellip 2009

                                                                hellip hellip hellip hellip

                                                                War Defiance hellip 2008

                                                                PartitionKey (Category)

                                                                RowKey (Title)

                                                                Timestamp ReleaseDate

                                                                Action Fast amp Furious hellip 2009

                                                                Action The Bourne Ultimatum hellip 2007

                                                                hellip hellip hellip hellip

                                                                Animation Open Season 2 hellip 2009

                                                                Animation The Ant Bully hellip 2006

                                                                hellip hellip hellip hellip

                                                                Comedy Office Space hellip 1999

                                                                hellip hellip hellip hellip

                                                                SciFi X-Men Origins Wolverine hellip 2009

                                                                hellip hellip hellip hellip

                                                                War Defiance hellip 2008

                                                                Partitions and Partition Ranges

                                                                Key Selection Things to Consider

                                                                bullDistribute load as much as possible bullHot partitions can be load balanced bullPartitionKey is critical for scalability

                                                                See httpwwwmicrosoftpdccom2009SVC09 and httpazurescopecloudappnet for more information

                                                                bull Avoid frequent large scans bull Parallelize queries bull Point queries are most efficient

                                                                bullTransactions across a single partition bullTransaction semantics amp Reduce round trips

                                                                Scalability

                                                                Query Efficiency amp Speed

                                                                Entity group transactions

                                                                Expect Continuation Tokens ndash Seriously

                                                                Maximum of 1000 rows in a response

                                                                At the end of partition range boundary

                                                                Maximum of 1000 rows in a response

                                                                At the end of partition range boundary

                                                                Maximum of 5 seconds to execute the query

                                                                Tables Recap bullEfficient for frequently used queries

                                                                bullSupports batch transactions

                                                                bullDistributes load

                                                                Select PartitionKey and RowKey that help scale

                                                                Avoid ldquoAppend onlyrdquo patterns

                                                                Always Handle continuation tokens

                                                                ldquoORrdquo predicates are not optimized

                                                                Implement back-off strategy for retries

                                                                bullDistribute by using a hash etc as prefix

                                                                bullExpect continuation tokens for range queries

                                                                bullExecute the queries that form the ldquoORrdquo predicates as separate queries

                                                                bullServer busy

                                                                bullLoad balance partitions to meet traffic needs

                                                                bullLoad on single partition has exceeded the limits

                                                                WCF Data Services

                                                                bullUse a new context for each logical operation

                                                                bullAddObjectAttachTo can throw exception if entity is already being tracked

                                                                bullPoint query throws an exception if resource does not exist Use IgnoreResourceNotFoundException

                                                                Queues Their Unique Role in Building Reliable Scalable Applications

                                                                bull Want roles that work closely together but are not bound together bull Tight coupling leads to brittleness

                                                                bull This can aid in scaling and performance

                                                                bull A queue can hold an unlimited number of messages bull Messages must be serializable as XML

                                                                bull Limited to 8KB in size

                                                                bull Commonly use the work ticket pattern

                                                                bull Why not simply use a table

                                                                Queue Terminology

                                                                Message Lifecycle

                                                                Queue

                                                                Msg 1

                                                                Msg 2

                                                                Msg 3

                                                                Msg 4

                                                                Worker Role

                                                                Worker Role

                                                                PutMessage

                                                                Web Role

                                                                GetMessage (Timeout) RemoveMessage

                                                                Msg 2 Msg 1

                                                                Worker Role

                                                                Msg 2

                                                                POST httpmyaccountqueuecorewindowsnetmyqueuemessages

                                                                HTTP11 200 OK Transfer-Encoding chunked Content-Type applicationxml Date Tue 09 Dec 2008 210430 GMT Server Nephos Queue Service Version 10 Microsoft-HTTPAPI20

                                                                ltxml version=10 encoding=utf-8gt ltQueueMessagesListgt ltQueueMessagegt ltMessageIdgt5974b586-0df3-4e2d-ad0c-18e3892bfca2ltMessageIdgt ltInsertionTimegtMon 22 Sep 2008 232920 GMTltInsertionTimegt ltExpirationTimegtMon 29 Sep 2008 232920 GMTltExpirationTimegt ltPopReceiptgtYzQ4Yzg1MDIGM0MDFiZDAwYzEwltPopReceiptgt ltTimeNextVisiblegtTue 23 Sep 2008 052920GMTltTimeNextVisiblegt ltMessageTextgtPHRlc3Q+dGdGVzdD4=ltMessageTextgt ltQueueMessagegt ltQueueMessagesListgt

                                                                DELETE httpmyaccountqueuecorewindowsnetmyqueuemessagesmessageidpopreceipt=YzQ4Yzg1MDIGM0MDFiZDAwYzEw

                                                                Truncated Exponential Back Off Polling

                                                                Consider a backoff polling approach Each empty poll

                                                                increases interval by 2x

                                                                A successful sets the interval back to 1

                                                                44

                                                                2 1

                                                                1 1

                                                                C1

                                                                C2

                                                                Removing Poison Messages

                                                                1 1

                                                                2 1

                                                                3 4 0

                                                                Producers Consumers

                                                                P2

                                                                P1

                                                                3 0

                                                                2 GetMessage(Q 30 s) msg 2

                                                                1 GetMessage(Q 30 s) msg 1

                                                                1 1

                                                                2 1

                                                                1 0

                                                                2 0

                                                                45

                                                                C1

                                                                C2

                                                                Removing Poison Messages

                                                                3 4 0

                                                                Producers Consumers

                                                                P2

                                                                P1

                                                                1 1

                                                                2 1

                                                                2 GetMessage(Q 30 s) msg 2 3 C2 consumed msg 2 4 DeleteMessage(Q msg 2) 7 GetMessage(Q 30 s) msg 1

                                                                1 GetMessage(Q 30 s) msg 1 5 C1 crashed

                                                                1 1

                                                                2 1

                                                                6 msg1 visible 30 s after Dequeue 3 0

                                                                1 2

                                                                1 1

                                                                1 2

                                                                46

                                                                C1

                                                                C2

                                                                Removing Poison Messages

                                                                3 4 0

                                                                Producers Consumers

                                                                P2

                                                                P1

                                                                1 2

                                                                2 Dequeue(Q 30 sec) msg 2 3 C2 consumed msg 2 4 Delete(Q msg 2) 7 Dequeue(Q 30 sec) msg 1 8 C2 crashed

                                                                1 Dequeue(Q 30 sec) msg 1 5 C1 crashed 10 C1 restarted 11 Dequeue(Q 30 sec) msg 1 12 DequeueCount gt 2 13 Delete (Q msg1) 1

                                                                2

                                                                6 msg1 visible 30s after Dequeue 9 msg1 visible 30s after Dequeue

                                                                3 0

                                                                1 3

                                                                1 2

                                                                1 3

                                                                Queues Recap

                                                                bullNo need to deal with failures Make message

                                                                processing idempotent

                                                                bullInvisible messages result in out of order Do not rely on order

                                                                bullEnforce threshold on messagersquos dequeue count Use Dequeue count to remove

                                                                poison messages

                                                                bullMessages gt 8KB

                                                                bullBatch messages

                                                                bullGarbage collect orphaned blobs

                                                                bullDynamically increasereduce workers

                                                                Use blob to store message data with

                                                                reference in message

                                                                Use message count to scale

                                                                bullNo need to deal with failures

                                                                bullInvisible messages result in out of order

                                                                bullEnforce threshold on messagersquos dequeue count

                                                                bullDynamically increasereduce workers

                                                                Windows Azure Storage Takeaways

                                                                Blobs

                                                                Drives

                                                                Tables

                                                                Queues

                                                                httpblogsmsdncomwindowsazurestorage

                                                                httpazurescopecloudappnet

                                                                49

                                                                A Quick Exercise

                                                                hellipThen letrsquos look at some code and some tools

                                                                50

                                                                Code ndash AccountInformationcs public class AccountInformation private static string storageKey = ldquotHiSiSnOtMyKeY private static string accountName = jjstore private static StorageCredentialsAccountAndKey credentials internal static StorageCredentialsAccountAndKey Credentials get if (credentials == null) credentials = new StorageCredentialsAccountAndKey(accountName storageKey) return credentials

                                                                51

                                                                Code ndash BlobHelpercs public class BlobHelper private static string defaultContainerName = school private CloudBlobClient client = null private CloudBlobContainer container = null private void InitContainer() if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudBlobClient() container = clientGetContainerReference(defaultContainerName) containerCreateIfNotExist() BlobContainerPermissions permissions = containerGetPermissions() permissionsPublicAccess = BlobContainerPublicAccessTypeContainer containerSetPermissions(permissions)

                                                                52

                                                                Code ndash BlobHelpercs

                                                                public void WriteFileToBlob(string filePath) if (client == null || container == null) InitContainer() FileInfo file = new FileInfo(filePath) CloudBlob blob = containerGetBlobReference(fileName) blobPropertiesContentType = GetContentType(fileExtension) blobUploadFile(fileFullName) Or if you want to write a string replace the last line with blobUploadText(someString) And make sure you set the content type to the appropriate MIME type (eg ldquotextplainrdquo)

                                                                53

                                                                Code ndash BlobHelpercs

                                                                public string GetBlobText(string blobName) if (client == null || container == null) InitContainer() CloudBlob blob = containerGetBlobReference(blobName) try return blobDownloadText() catch (Exception) The blob probably does not exist or there is no connection available return null

                                                                54

                                                                Application Code - Blobs private void SaveToCloudButton_Click(object sender RoutedEventArgs e) StringBuilder buff = new StringBuilder() buffAppendLine(LastNameFirstNameEmailBirthdayNativeLanguageFavoriteIceCreamYearsInPhDGraduated) foreach (AttendeeEntity attendee in attendees) buffAppendLine(attendeeToCsvString()) blobHelperWriteStringToBlob(SummerSchoolAttendeestxt buffToString())

                                                                The blob is now available at httpltAccountNamegtblobcorewindowsnetltContainerNamegtltBlobNamegt Or in this case httpjjstoreblobcorewindowsnetschoolSummerSchoolAttendeestxt

                                                                55

                                                                Code - TableEntities using MicrosoftWindowsAzureStorageClient public class AttendeeEntity TableServiceEntity public string FirstName get set public string LastName get set public string Email get set public DateTime Birthday get set public string FavoriteIceCream get set public int YearsInPhD get set public bool Graduated get set hellip

                                                                56

                                                                Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

                                                                57

                                                                Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

                                                                58

                                                                Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

                                                                59

                                                                Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

                                                                60

                                                                Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

                                                                Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

                                                                61

                                                                Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

                                                                62

                                                                Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

                                                                Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

                                                                63

                                                                Tools ndash Fiddler2

                                                                Best Practices

                                                                Picking the Right VM Size

                                                                bull Having the correct VM size can make a big difference in costs

                                                                bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                                                                bull If you scale better than linear across cores larger VMs could save you money

                                                                bull Pretty rare to see linear scaling across 8 cores

                                                                bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                                                                bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                                                                Using Your VM to the Maximum

                                                                Remember bull 1 role instance == 1 VM running Windows

                                                                bull 1 role instance = one specific task for your code

                                                                bull Yoursquore paying for the entire VM so why not use it

                                                                bull Common mistake ndash split up code into multiple roles each not using up CPU

                                                                bull Balance between using up CPU vs having free capacity in times of need

                                                                bull Multiple ways to use your CPU to the fullest

                                                                Exploiting Concurrency

                                                                bull Spin up additional processes each with a specific task or as a unit of concurrency

                                                                bull May not be ideal if number of active processes exceeds number of cores

                                                                bull Use multithreading aggressively

                                                                bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                                                                bull In NET 4 use the Task Parallel Library

                                                                bull Data parallelism

                                                                bull Task parallelism

                                                                Finding Good Code Neighbors

                                                                bull Typically code falls into one or more of these categories

                                                                bull Find code that is intensive with different resources to live together

                                                                bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                                                                Memory Intensive

                                                                CPU Intensive

                                                                Network IO Intensive

                                                                Storage IO Intensive

                                                                Scaling Appropriately

                                                                bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                                                                bull Spinning VMs up and down automatically is good at large scale

                                                                bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                                                                bull Being too aggressive in spinning down VMs can result in poor user experience

                                                                bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                                                                Performance Cost

                                                                Storage Costs

                                                                bull Understand an applicationrsquos storage profile and how storage billing works

                                                                bull Make service choices based on your app profile

                                                                bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                                                                bull Service choice can make a big cost difference based on your app profile

                                                                bull Caching and compressing They help a lot with storage costs

                                                                Saving Bandwidth Costs

                                                                Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                                                Sending fewer things over the wire often means getting fewer things from storage

                                                                Saving bandwidth costs often lead to savings in other places

                                                                Sending fewer things means your VM has time to do other tasks

                                                                All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                                                Compressing Content

                                                                1 Gzip all output content

                                                                bull All modern browsers can decompress on the fly

                                                                bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                                                2Tradeoff compute costs for storage size

                                                                3Minimize image sizes

                                                                bull Use Portable Network Graphics (PNGs)

                                                                bull Crush your PNGs

                                                                bull Strip needless metadata

                                                                bull Make all PNGs palette PNGs

                                                                Uncompressed

                                                                Content

                                                                Compressed

                                                                Content

                                                                Gzip

                                                                Minify JavaScript

                                                                Minify CCS

                                                                Minify Images

                                                                Best Practices Summary

                                                                Doing lsquolessrsquo is the key to saving costs

                                                                Measure everything

                                                                Know your application profile in and out

                                                                Research Examples in the Cloud

                                                                hellipon another set of slides

                                                                Map Reduce on Azure

                                                                bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                                                bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                                                76

                                                                Project Daytona - Map Reduce on Azure

                                                                httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                                                77

                                                                Questions and Discussionhellip

                                                                Thank you for hosting me at the Summer School

                                                                BLAST (Basic Local Alignment Search Tool)

                                                                bull The most important software in bioinformatics

                                                                bull Identify similarity between bio-sequences

                                                                Computationally intensive

                                                                bull Large number of pairwise alignment operations

                                                                bull A BLAST running can take 700 ~ 1000 CPU hours

                                                                bull Sequence databases growing exponentially

                                                                bull GenBank doubled in size in about 15 months

                                                                It is easy to parallelize BLAST

                                                                bull Segment the input

                                                                bull Segment processing (querying) is pleasingly parallel

                                                                bull Segment the database (eg mpiBLAST)

                                                                bull Needs special result reduction processing

                                                                Large volume data

                                                                bull A normal Blast database can be as large as 10GB

                                                                bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                                bull The output of BLAST is usually 10-100x larger than the input

                                                                bull Parallel BLAST engine on Azure

                                                                bull Query-segmentation data-parallel pattern bull split the input sequences

                                                                bull query partitions in parallel

                                                                bull merge results together when done

                                                                bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                                bull With three special considerations bull Batch job management

                                                                bull Task parallelism on an elastic Cloud

                                                                Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                                on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                                A simple SplitJoin pattern

                                                                Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                                bull 1248 for small middle large and extra large instance size

                                                                Task granularity bull Large partition load imbalance

                                                                bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                                bull Data transferring overhead

                                                                Best Practice test runs to profiling and set size to mitigate the overhead

                                                                Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                                bull too small repeated computation

                                                                bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                                bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                                bull Watch out for the 2-hour maximum limitation

                                                                BLAST task

                                                                Splitting task

                                                                BLAST task

                                                                BLAST task

                                                                BLAST task

                                                                hellip

                                                                Merging Task

                                                                Task size vs Performance

                                                                bull Benefit of the warm cache effect

                                                                bull 100 sequences per partition is the best choice

                                                                Instance size vs Performance

                                                                bull Super-linear speedup with larger size worker instances

                                                                bull Primarily due to the memory capability

                                                                Task SizeInstance Size vs Cost

                                                                bull Extra-large instance generated the best and the most economical throughput

                                                                bull Fully utilize the resource

                                                                Web

                                                                Portal

                                                                Web

                                                                Service

                                                                Job registration

                                                                Job Scheduler

                                                                Worker

                                                                Worker

                                                                Worker

                                                                Global

                                                                dispatch

                                                                queue

                                                                Web Role

                                                                Azure Table

                                                                Job Management Role

                                                                Azure Blob

                                                                Database

                                                                updating Role

                                                                hellip

                                                                Scaling Engine

                                                                Blast databases

                                                                temporary data etc)

                                                                Job Registry NCBI databases

                                                                BLAST task

                                                                Splitting task

                                                                BLAST task

                                                                BLAST task

                                                                BLAST task

                                                                hellip

                                                                Merging Task

                                                                ASPNET program hosted by a web role instance bull Submit jobs

                                                                bull Track jobrsquos status and logs

                                                                AuthenticationAuthorization based on Live ID

                                                                The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                                Web Portal

                                                                Web Service

                                                                Job registration

                                                                Job Scheduler

                                                                Job Portal

                                                                Scaling Engine

                                                                Job Registry

                                                                R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                                Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                                bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                                AzureBLAST significantly saved computing timehellip

                                                                Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                                ldquoAll against Allrdquo query bull The database is also the input query

                                                                bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                                bull Theoretically 100 billion sequence comparisons

                                                                Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                                bull Would require 3216731 minutes (61 years) on one desktop

                                                                One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                                bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                bull Each segment consists of smaller partitions

                                                                bull When load imbalances redistribute the load manually

                                                                5

                                                                0

                                                                62 6

                                                                2 6

                                                                2 6

                                                                2 6

                                                                2 5

                                                                0 62

                                                                bull Total size of the output result is ~230GB

                                                                bull The number of total hits is 1764579487

                                                                bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                bull But based our estimates real working instance time should be 6~8 day

                                                                bull Look into log data to analyze what took placehellip

                                                                5

                                                                0

                                                                62 6

                                                                2 6

                                                                2 6

                                                                2 6

                                                                2 5

                                                                0 62

                                                                A normal log record should be

                                                                Otherwise something is wrong (eg task failed to complete)

                                                                3312010 614 RD00155D3611B0 Executing the task 251523

                                                                3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                3312010 625 RD00155D3611B0 Executing the task 251553

                                                                3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                3312010 644 RD00155D3611B0 Executing the task 251600

                                                                3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                3312010 822 RD00155D3611B0 Executing the task 251774

                                                                3312010 950 RD00155D3611B0 Executing the task 251895

                                                                3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                North Europe Data Center totally 34256 tasks processed

                                                                All 62 compute nodes lost tasks

                                                                and then came back in a group

                                                                This is an Update domain

                                                                ~30 mins

                                                                ~ 6 nodes in one group

                                                                35 Nodes experience blob

                                                                writing failure at same

                                                                time

                                                                West Europe Datacenter 30976 tasks are completed and job was killed

                                                                A reasonable guess the

                                                                Fault Domain is working

                                                                MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                You never miss the water till the well has run dry Irish Proverb

                                                                ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                λv = Latent heat of vaporization (Jg)

                                                                Rn = Net radiation (W m-2)

                                                                cp = Specific heat capacity of air (J kg-1 K-1)

                                                                ρa = dry air density (kg m-3)

                                                                δq = vapor pressure deficit (Pa)

                                                                ga = Conductivity of air (inverse of ra) (m s-1)

                                                                gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                Estimating resistanceconductivity across a

                                                                catchment can be tricky

                                                                bull Lots of inputs big data reduction

                                                                bull Some of the inputs are not so simple

                                                                119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                Penman-Monteith (1964)

                                                                Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                NASA MODIS

                                                                imagery source

                                                                archives

                                                                5 TB (600K files)

                                                                FLUXNET curated

                                                                sensor dataset

                                                                (30GB 960 files)

                                                                FLUXNET curated

                                                                field dataset

                                                                2 KB (1 file)

                                                                NCEPNCAR

                                                                ~100MB

                                                                (4K files)

                                                                Vegetative

                                                                clumping

                                                                ~5MB (1file)

                                                                Climate

                                                                classification

                                                                ~1MB (1file)

                                                                20 US year = 1 global year

                                                                Data collection (map) stage

                                                                bull Downloads requested input tiles

                                                                from NASA ftp sites

                                                                bull Includes geospatial lookup for

                                                                non-sinusoidal tiles that will

                                                                contribute to a reprojected

                                                                sinusoidal tile

                                                                Reprojection (map) stage

                                                                bull Converts source tile(s) to

                                                                intermediate result sinusoidal tiles

                                                                bull Simple nearest neighbor or spline

                                                                algorithms

                                                                Derivation reduction stage

                                                                bull First stage visible to scientist

                                                                bull Computes ET in our initial use

                                                                Analysis reduction stage

                                                                bull Optional second stage visible to

                                                                scientist

                                                                bull Enables production of science

                                                                analysis artifacts such as maps

                                                                tables virtual sensors

                                                                Reduction 1

                                                                Queue

                                                                Source

                                                                Metadata

                                                                AzureMODIS

                                                                Service Web Role Portal

                                                                Request

                                                                Queue

                                                                Scientific

                                                                Results

                                                                Download

                                                                Data Collection Stage

                                                                Source Imagery Download Sites

                                                                Reprojection

                                                                Queue

                                                                Reduction 2

                                                                Queue

                                                                Download

                                                                Queue

                                                                Scientists

                                                                Science results

                                                                Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                recoverable units of work

                                                                bull Execution status of all jobs and tasks persisted in Tables

                                                                ltPipelineStagegt

                                                                Request

                                                                hellip ltPipelineStagegtJobStatus

                                                                Persist ltPipelineStagegtJob Queue

                                                                MODISAzure Service

                                                                (Web Role)

                                                                Service Monitor

                                                                (Worker Role)

                                                                Parse amp Persist ltPipelineStagegtTaskStatus

                                                                hellip

                                                                Dispatch

                                                                ltPipelineStagegtTask Queue

                                                                All work actually done by a Worker Role

                                                                bull Sandboxes science or other executable

                                                                bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                Service Monitor

                                                                (Worker Role)

                                                                Parse amp Persist ltPipelineStagegtTaskStatus

                                                                GenericWorker

                                                                (Worker Role)

                                                                hellip

                                                                hellip

                                                                Dispatch

                                                                ltPipelineStagegtTask Queue

                                                                hellip

                                                                ltInputgtData Storage

                                                                bull Dequeues tasks created by the Service Monitor

                                                                bull Retries failed tasks 3 times

                                                                bull Maintains all task status

                                                                Reprojection Request

                                                                hellip

                                                                Service Monitor

                                                                (Worker Role)

                                                                ReprojectionJobStatus Persist

                                                                Parse amp Persist ReprojectionTaskStatus

                                                                GenericWorker

                                                                (Worker Role)

                                                                hellip

                                                                Job Queue

                                                                hellip

                                                                Dispatch

                                                                Task Queue

                                                                Points to

                                                                hellip

                                                                ScanTimeList

                                                                SwathGranuleMeta

                                                                Reprojection Data

                                                                Storage

                                                                Each entity specifies a

                                                                single reprojection job

                                                                request

                                                                Each entity specifies a

                                                                single reprojection task (ie

                                                                a single tile)

                                                                Query this table to get

                                                                geo-metadata (eg

                                                                boundaries) for each swath

                                                                tile

                                                                Query this table to get the

                                                                list of satellite scan times

                                                                that cover a target tile

                                                                Swath Source

                                                                Data Storage

                                                                bull Computational costs driven by data scale and need to run reduction multiple times

                                                                bull Storage costs driven by data scale and 6 month project duration

                                                                bull Small with respect to the people costs even at graduate student rates

                                                                Reduction 1 Queue

                                                                Source Metadata

                                                                Request Queue

                                                                Scientific Results Download

                                                                Data Collection Stage

                                                                Source Imagery Download Sites

                                                                Reprojection Queue

                                                                Reduction 2 Queue

                                                                Download

                                                                Queue

                                                                Scientists

                                                                Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                400-500 GB

                                                                60K files

                                                                10 MBsec

                                                                11 hours

                                                                lt10 workers

                                                                $50 upload

                                                                $450 storage

                                                                400 GB

                                                                45K files

                                                                3500 hours

                                                                20-100

                                                                workers

                                                                5-7 GB

                                                                55K files

                                                                1800 hours

                                                                20-100

                                                                workers

                                                                lt10 GB

                                                                ~1K files

                                                                1800 hours

                                                                20-100

                                                                workers

                                                                $420 cpu

                                                                $60 download

                                                                $216 cpu

                                                                $1 download

                                                                $6 storage

                                                                $216 cpu

                                                                $2 download

                                                                $9 storage

                                                                AzureMODIS

                                                                Service Web Role Portal

                                                                Total $1420

                                                                bull Clouds are the largest scale computer centers ever constructed and have the

                                                                potential to be important to both large and small scale science problems

                                                                bull Equally import they can increase participation in research providing needed

                                                                resources to userscommunities without ready access

                                                                bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                latency applications do not perform optimally on clouds today

                                                                bull Provide valuable fault tolerance and scalability abstractions

                                                                bull Clouds as amplifier for familiar client tools and on premise compute

                                                                bull Clouds services to support research provide considerable leverage for both

                                                                individual researchers and entire communities of researchers

                                                                • Day 2 - Azure as a PaaS
                                                                • Day 2 - Applications

                                                                  Storage Partitioning

                                                                  Understanding partitioning is key to understanding performance

                                                                  bull Different for each data type (blobs entities queues) Every data object has a partition key

                                                                  bull A partition can be served by a single server

                                                                  bull System load balances partitions based on traffic pattern

                                                                  bull Controls entity locality Partition key is unit of scale

                                                                  bull Load balancing can take a few minutes to kick in

                                                                  bull Can take a couple of seconds for partition to be available on a different server

                                                                  System load balances

                                                                  bull Use exponential backoff on ldquoServer Busyrdquo

                                                                  bull Our system load balances to meet your traffic needs

                                                                  bull Single partition limits have been reached Server Busy

                                                                  Partition Keys In Each Abstraction

                                                                  bull Entities w same PartitionKey value served from same partition Entities ndash TableName + PartitionKey

                                                                  PartitionKey (CustomerId) RowKey (RowKind)

                                                                  Name CreditCardNumber OrderTotal

                                                                  1 Customer-John Smith John Smith xxxx-xxxx-xxxx-xxxx

                                                                  1 Order ndash 1 $3512

                                                                  2 Customer-Bill Johnson Bill Johnson xxxx-xxxx-xxxx-xxxx

                                                                  2 Order ndash 3 $1000

                                                                  bull Every blob and its snapshots are in a single partition Blobs ndash Container name + Blob name

                                                                  bullAll messages for a single queue belong to the same partition Messages ndash Queue Name

                                                                  Container Name Blob Name

                                                                  image annarborbighousejpg

                                                                  image foxboroughgillettejpg

                                                                  video annarborbighousejpg

                                                                  Queue Message

                                                                  jobs Message1

                                                                  jobs Message2

                                                                  workflow Message1

                                                                  Scalability Targets

                                                                  Storage Account

                                                                  bull Capacity ndash Up to 100 TBs

                                                                  bull Transactions ndash Up to a few thousand requests per second

                                                                  bull Bandwidth ndash Up to a few hundred megabytes per second

                                                                  Single QueueTable Partition

                                                                  bull Up to 500 transactions per second

                                                                  To go above these numbers partition between multiple storage accounts and partitions

                                                                  When limit is hit app will see lsquo503 server busyrsquo applications should implement exponential backoff

                                                                  Single Blob Partition

                                                                  bull Throughput up to 60 MBs

                                                                  PartitionKey (Category)

                                                                  RowKey (Title)

                                                                  Timestamp ReleaseDate

                                                                  Action Fast amp Furious hellip 2009

                                                                  Action The Bourne Ultimatum hellip 2007

                                                                  hellip hellip hellip hellip

                                                                  Animation Open Season 2 hellip 2009

                                                                  Animation The Ant Bully hellip 2006

                                                                  PartitionKey (Category)

                                                                  RowKey (Title)

                                                                  Timestamp ReleaseDate

                                                                  Comedy Office Space hellip 1999

                                                                  hellip hellip hellip hellip

                                                                  SciFi X-Men Origins Wolverine hellip 2009

                                                                  hellip hellip hellip hellip

                                                                  War Defiance hellip 2008

                                                                  PartitionKey (Category)

                                                                  RowKey (Title)

                                                                  Timestamp ReleaseDate

                                                                  Action Fast amp Furious hellip 2009

                                                                  Action The Bourne Ultimatum hellip 2007

                                                                  hellip hellip hellip hellip

                                                                  Animation Open Season 2 hellip 2009

                                                                  Animation The Ant Bully hellip 2006

                                                                  hellip hellip hellip hellip

                                                                  Comedy Office Space hellip 1999

                                                                  hellip hellip hellip hellip

                                                                  SciFi X-Men Origins Wolverine hellip 2009

                                                                  hellip hellip hellip hellip

                                                                  War Defiance hellip 2008

                                                                  Partitions and Partition Ranges

                                                                  Key Selection Things to Consider

                                                                  bullDistribute load as much as possible bullHot partitions can be load balanced bullPartitionKey is critical for scalability

                                                                  See httpwwwmicrosoftpdccom2009SVC09 and httpazurescopecloudappnet for more information

                                                                  bull Avoid frequent large scans bull Parallelize queries bull Point queries are most efficient

                                                                  bullTransactions across a single partition bullTransaction semantics amp Reduce round trips

                                                                  Scalability

                                                                  Query Efficiency amp Speed

                                                                  Entity group transactions

                                                                  Expect Continuation Tokens ndash Seriously

                                                                  Maximum of 1000 rows in a response

                                                                  At the end of partition range boundary

                                                                  Maximum of 1000 rows in a response

                                                                  At the end of partition range boundary

                                                                  Maximum of 5 seconds to execute the query

                                                                  Tables Recap bullEfficient for frequently used queries

                                                                  bullSupports batch transactions

                                                                  bullDistributes load

                                                                  Select PartitionKey and RowKey that help scale

                                                                  Avoid ldquoAppend onlyrdquo patterns

                                                                  Always Handle continuation tokens

                                                                  ldquoORrdquo predicates are not optimized

                                                                  Implement back-off strategy for retries

                                                                  bullDistribute by using a hash etc as prefix

                                                                  bullExpect continuation tokens for range queries

                                                                  bullExecute the queries that form the ldquoORrdquo predicates as separate queries

                                                                  bullServer busy

                                                                  bullLoad balance partitions to meet traffic needs

                                                                  bullLoad on single partition has exceeded the limits

                                                                  WCF Data Services

                                                                  bullUse a new context for each logical operation

                                                                  bullAddObjectAttachTo can throw exception if entity is already being tracked

                                                                  bullPoint query throws an exception if resource does not exist Use IgnoreResourceNotFoundException

                                                                  Queues Their Unique Role in Building Reliable Scalable Applications

                                                                  bull Want roles that work closely together but are not bound together bull Tight coupling leads to brittleness

                                                                  bull This can aid in scaling and performance

                                                                  bull A queue can hold an unlimited number of messages bull Messages must be serializable as XML

                                                                  bull Limited to 8KB in size

                                                                  bull Commonly use the work ticket pattern

                                                                  bull Why not simply use a table

                                                                  Queue Terminology

                                                                  Message Lifecycle

                                                                  Queue

                                                                  Msg 1

                                                                  Msg 2

                                                                  Msg 3

                                                                  Msg 4

                                                                  Worker Role

                                                                  Worker Role

                                                                  PutMessage

                                                                  Web Role

                                                                  GetMessage (Timeout) RemoveMessage

                                                                  Msg 2 Msg 1

                                                                  Worker Role

                                                                  Msg 2

                                                                  POST httpmyaccountqueuecorewindowsnetmyqueuemessages

                                                                  HTTP11 200 OK Transfer-Encoding chunked Content-Type applicationxml Date Tue 09 Dec 2008 210430 GMT Server Nephos Queue Service Version 10 Microsoft-HTTPAPI20

                                                                  ltxml version=10 encoding=utf-8gt ltQueueMessagesListgt ltQueueMessagegt ltMessageIdgt5974b586-0df3-4e2d-ad0c-18e3892bfca2ltMessageIdgt ltInsertionTimegtMon 22 Sep 2008 232920 GMTltInsertionTimegt ltExpirationTimegtMon 29 Sep 2008 232920 GMTltExpirationTimegt ltPopReceiptgtYzQ4Yzg1MDIGM0MDFiZDAwYzEwltPopReceiptgt ltTimeNextVisiblegtTue 23 Sep 2008 052920GMTltTimeNextVisiblegt ltMessageTextgtPHRlc3Q+dGdGVzdD4=ltMessageTextgt ltQueueMessagegt ltQueueMessagesListgt

                                                                  DELETE httpmyaccountqueuecorewindowsnetmyqueuemessagesmessageidpopreceipt=YzQ4Yzg1MDIGM0MDFiZDAwYzEw

                                                                  Truncated Exponential Back Off Polling

                                                                  Consider a backoff polling approach Each empty poll

                                                                  increases interval by 2x

                                                                  A successful sets the interval back to 1

                                                                  44

                                                                  2 1

                                                                  1 1

                                                                  C1

                                                                  C2

                                                                  Removing Poison Messages

                                                                  1 1

                                                                  2 1

                                                                  3 4 0

                                                                  Producers Consumers

                                                                  P2

                                                                  P1

                                                                  3 0

                                                                  2 GetMessage(Q 30 s) msg 2

                                                                  1 GetMessage(Q 30 s) msg 1

                                                                  1 1

                                                                  2 1

                                                                  1 0

                                                                  2 0

                                                                  45

                                                                  C1

                                                                  C2

                                                                  Removing Poison Messages

                                                                  3 4 0

                                                                  Producers Consumers

                                                                  P2

                                                                  P1

                                                                  1 1

                                                                  2 1

                                                                  2 GetMessage(Q 30 s) msg 2 3 C2 consumed msg 2 4 DeleteMessage(Q msg 2) 7 GetMessage(Q 30 s) msg 1

                                                                  1 GetMessage(Q 30 s) msg 1 5 C1 crashed

                                                                  1 1

                                                                  2 1

                                                                  6 msg1 visible 30 s after Dequeue 3 0

                                                                  1 2

                                                                  1 1

                                                                  1 2

                                                                  46

                                                                  C1

                                                                  C2

                                                                  Removing Poison Messages

                                                                  3 4 0

                                                                  Producers Consumers

                                                                  P2

                                                                  P1

                                                                  1 2

                                                                  2 Dequeue(Q 30 sec) msg 2 3 C2 consumed msg 2 4 Delete(Q msg 2) 7 Dequeue(Q 30 sec) msg 1 8 C2 crashed

                                                                  1 Dequeue(Q 30 sec) msg 1 5 C1 crashed 10 C1 restarted 11 Dequeue(Q 30 sec) msg 1 12 DequeueCount gt 2 13 Delete (Q msg1) 1

                                                                  2

                                                                  6 msg1 visible 30s after Dequeue 9 msg1 visible 30s after Dequeue

                                                                  3 0

                                                                  1 3

                                                                  1 2

                                                                  1 3

                                                                  Queues Recap

                                                                  bullNo need to deal with failures Make message

                                                                  processing idempotent

                                                                  bullInvisible messages result in out of order Do not rely on order

                                                                  bullEnforce threshold on messagersquos dequeue count Use Dequeue count to remove

                                                                  poison messages

                                                                  bullMessages gt 8KB

                                                                  bullBatch messages

                                                                  bullGarbage collect orphaned blobs

                                                                  bullDynamically increasereduce workers

                                                                  Use blob to store message data with

                                                                  reference in message

                                                                  Use message count to scale

                                                                  bullNo need to deal with failures

                                                                  bullInvisible messages result in out of order

                                                                  bullEnforce threshold on messagersquos dequeue count

                                                                  bullDynamically increasereduce workers

                                                                  Windows Azure Storage Takeaways

                                                                  Blobs

                                                                  Drives

                                                                  Tables

                                                                  Queues

                                                                  httpblogsmsdncomwindowsazurestorage

                                                                  httpazurescopecloudappnet

                                                                  49

                                                                  A Quick Exercise

                                                                  hellipThen letrsquos look at some code and some tools

                                                                  50

                                                                  Code ndash AccountInformationcs public class AccountInformation private static string storageKey = ldquotHiSiSnOtMyKeY private static string accountName = jjstore private static StorageCredentialsAccountAndKey credentials internal static StorageCredentialsAccountAndKey Credentials get if (credentials == null) credentials = new StorageCredentialsAccountAndKey(accountName storageKey) return credentials

                                                                  51

                                                                  Code ndash BlobHelpercs public class BlobHelper private static string defaultContainerName = school private CloudBlobClient client = null private CloudBlobContainer container = null private void InitContainer() if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudBlobClient() container = clientGetContainerReference(defaultContainerName) containerCreateIfNotExist() BlobContainerPermissions permissions = containerGetPermissions() permissionsPublicAccess = BlobContainerPublicAccessTypeContainer containerSetPermissions(permissions)

                                                                  52

                                                                  Code ndash BlobHelpercs

                                                                  public void WriteFileToBlob(string filePath) if (client == null || container == null) InitContainer() FileInfo file = new FileInfo(filePath) CloudBlob blob = containerGetBlobReference(fileName) blobPropertiesContentType = GetContentType(fileExtension) blobUploadFile(fileFullName) Or if you want to write a string replace the last line with blobUploadText(someString) And make sure you set the content type to the appropriate MIME type (eg ldquotextplainrdquo)

                                                                  53

                                                                  Code ndash BlobHelpercs

                                                                  public string GetBlobText(string blobName) if (client == null || container == null) InitContainer() CloudBlob blob = containerGetBlobReference(blobName) try return blobDownloadText() catch (Exception) The blob probably does not exist or there is no connection available return null

                                                                  54

                                                                  Application Code - Blobs private void SaveToCloudButton_Click(object sender RoutedEventArgs e) StringBuilder buff = new StringBuilder() buffAppendLine(LastNameFirstNameEmailBirthdayNativeLanguageFavoriteIceCreamYearsInPhDGraduated) foreach (AttendeeEntity attendee in attendees) buffAppendLine(attendeeToCsvString()) blobHelperWriteStringToBlob(SummerSchoolAttendeestxt buffToString())

                                                                  The blob is now available at httpltAccountNamegtblobcorewindowsnetltContainerNamegtltBlobNamegt Or in this case httpjjstoreblobcorewindowsnetschoolSummerSchoolAttendeestxt

                                                                  55

                                                                  Code - TableEntities using MicrosoftWindowsAzureStorageClient public class AttendeeEntity TableServiceEntity public string FirstName get set public string LastName get set public string Email get set public DateTime Birthday get set public string FavoriteIceCream get set public int YearsInPhD get set public bool Graduated get set hellip

                                                                  56

                                                                  Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

                                                                  57

                                                                  Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

                                                                  58

                                                                  Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

                                                                  59

                                                                  Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

                                                                  60

                                                                  Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

                                                                  Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

                                                                  61

                                                                  Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

                                                                  62

                                                                  Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

                                                                  Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

                                                                  63

                                                                  Tools ndash Fiddler2

                                                                  Best Practices

                                                                  Picking the Right VM Size

                                                                  bull Having the correct VM size can make a big difference in costs

                                                                  bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                                                                  bull If you scale better than linear across cores larger VMs could save you money

                                                                  bull Pretty rare to see linear scaling across 8 cores

                                                                  bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                                                                  bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                                                                  Using Your VM to the Maximum

                                                                  Remember bull 1 role instance == 1 VM running Windows

                                                                  bull 1 role instance = one specific task for your code

                                                                  bull Yoursquore paying for the entire VM so why not use it

                                                                  bull Common mistake ndash split up code into multiple roles each not using up CPU

                                                                  bull Balance between using up CPU vs having free capacity in times of need

                                                                  bull Multiple ways to use your CPU to the fullest

                                                                  Exploiting Concurrency

                                                                  bull Spin up additional processes each with a specific task or as a unit of concurrency

                                                                  bull May not be ideal if number of active processes exceeds number of cores

                                                                  bull Use multithreading aggressively

                                                                  bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                                                                  bull In NET 4 use the Task Parallel Library

                                                                  bull Data parallelism

                                                                  bull Task parallelism

                                                                  Finding Good Code Neighbors

                                                                  bull Typically code falls into one or more of these categories

                                                                  bull Find code that is intensive with different resources to live together

                                                                  bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                                                                  Memory Intensive

                                                                  CPU Intensive

                                                                  Network IO Intensive

                                                                  Storage IO Intensive

                                                                  Scaling Appropriately

                                                                  bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                                                                  bull Spinning VMs up and down automatically is good at large scale

                                                                  bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                                                                  bull Being too aggressive in spinning down VMs can result in poor user experience

                                                                  bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                                                                  Performance Cost

                                                                  Storage Costs

                                                                  bull Understand an applicationrsquos storage profile and how storage billing works

                                                                  bull Make service choices based on your app profile

                                                                  bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                                                                  bull Service choice can make a big cost difference based on your app profile

                                                                  bull Caching and compressing They help a lot with storage costs

                                                                  Saving Bandwidth Costs

                                                                  Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                                                  Sending fewer things over the wire often means getting fewer things from storage

                                                                  Saving bandwidth costs often lead to savings in other places

                                                                  Sending fewer things means your VM has time to do other tasks

                                                                  All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                                                  Compressing Content

                                                                  1 Gzip all output content

                                                                  bull All modern browsers can decompress on the fly

                                                                  bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                                                  2Tradeoff compute costs for storage size

                                                                  3Minimize image sizes

                                                                  bull Use Portable Network Graphics (PNGs)

                                                                  bull Crush your PNGs

                                                                  bull Strip needless metadata

                                                                  bull Make all PNGs palette PNGs

                                                                  Uncompressed

                                                                  Content

                                                                  Compressed

                                                                  Content

                                                                  Gzip

                                                                  Minify JavaScript

                                                                  Minify CCS

                                                                  Minify Images

                                                                  Best Practices Summary

                                                                  Doing lsquolessrsquo is the key to saving costs

                                                                  Measure everything

                                                                  Know your application profile in and out

                                                                  Research Examples in the Cloud

                                                                  hellipon another set of slides

                                                                  Map Reduce on Azure

                                                                  bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                                                  bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                                                  76

                                                                  Project Daytona - Map Reduce on Azure

                                                                  httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                                                  77

                                                                  Questions and Discussionhellip

                                                                  Thank you for hosting me at the Summer School

                                                                  BLAST (Basic Local Alignment Search Tool)

                                                                  bull The most important software in bioinformatics

                                                                  bull Identify similarity between bio-sequences

                                                                  Computationally intensive

                                                                  bull Large number of pairwise alignment operations

                                                                  bull A BLAST running can take 700 ~ 1000 CPU hours

                                                                  bull Sequence databases growing exponentially

                                                                  bull GenBank doubled in size in about 15 months

                                                                  It is easy to parallelize BLAST

                                                                  bull Segment the input

                                                                  bull Segment processing (querying) is pleasingly parallel

                                                                  bull Segment the database (eg mpiBLAST)

                                                                  bull Needs special result reduction processing

                                                                  Large volume data

                                                                  bull A normal Blast database can be as large as 10GB

                                                                  bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                                  bull The output of BLAST is usually 10-100x larger than the input

                                                                  bull Parallel BLAST engine on Azure

                                                                  bull Query-segmentation data-parallel pattern bull split the input sequences

                                                                  bull query partitions in parallel

                                                                  bull merge results together when done

                                                                  bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                                  bull With three special considerations bull Batch job management

                                                                  bull Task parallelism on an elastic Cloud

                                                                  Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                                  on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                                  A simple SplitJoin pattern

                                                                  Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                                  bull 1248 for small middle large and extra large instance size

                                                                  Task granularity bull Large partition load imbalance

                                                                  bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                                  bull Data transferring overhead

                                                                  Best Practice test runs to profiling and set size to mitigate the overhead

                                                                  Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                                  bull too small repeated computation

                                                                  bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                                  bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                                  bull Watch out for the 2-hour maximum limitation

                                                                  BLAST task

                                                                  Splitting task

                                                                  BLAST task

                                                                  BLAST task

                                                                  BLAST task

                                                                  hellip

                                                                  Merging Task

                                                                  Task size vs Performance

                                                                  bull Benefit of the warm cache effect

                                                                  bull 100 sequences per partition is the best choice

                                                                  Instance size vs Performance

                                                                  bull Super-linear speedup with larger size worker instances

                                                                  bull Primarily due to the memory capability

                                                                  Task SizeInstance Size vs Cost

                                                                  bull Extra-large instance generated the best and the most economical throughput

                                                                  bull Fully utilize the resource

                                                                  Web

                                                                  Portal

                                                                  Web

                                                                  Service

                                                                  Job registration

                                                                  Job Scheduler

                                                                  Worker

                                                                  Worker

                                                                  Worker

                                                                  Global

                                                                  dispatch

                                                                  queue

                                                                  Web Role

                                                                  Azure Table

                                                                  Job Management Role

                                                                  Azure Blob

                                                                  Database

                                                                  updating Role

                                                                  hellip

                                                                  Scaling Engine

                                                                  Blast databases

                                                                  temporary data etc)

                                                                  Job Registry NCBI databases

                                                                  BLAST task

                                                                  Splitting task

                                                                  BLAST task

                                                                  BLAST task

                                                                  BLAST task

                                                                  hellip

                                                                  Merging Task

                                                                  ASPNET program hosted by a web role instance bull Submit jobs

                                                                  bull Track jobrsquos status and logs

                                                                  AuthenticationAuthorization based on Live ID

                                                                  The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                                  Web Portal

                                                                  Web Service

                                                                  Job registration

                                                                  Job Scheduler

                                                                  Job Portal

                                                                  Scaling Engine

                                                                  Job Registry

                                                                  R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                                  Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                                  bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                                  AzureBLAST significantly saved computing timehellip

                                                                  Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                                  ldquoAll against Allrdquo query bull The database is also the input query

                                                                  bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                                  bull Theoretically 100 billion sequence comparisons

                                                                  Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                                  bull Would require 3216731 minutes (61 years) on one desktop

                                                                  One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                                  bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                  bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                  bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                  bull Each segment consists of smaller partitions

                                                                  bull When load imbalances redistribute the load manually

                                                                  5

                                                                  0

                                                                  62 6

                                                                  2 6

                                                                  2 6

                                                                  2 6

                                                                  2 5

                                                                  0 62

                                                                  bull Total size of the output result is ~230GB

                                                                  bull The number of total hits is 1764579487

                                                                  bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                  bull But based our estimates real working instance time should be 6~8 day

                                                                  bull Look into log data to analyze what took placehellip

                                                                  5

                                                                  0

                                                                  62 6

                                                                  2 6

                                                                  2 6

                                                                  2 6

                                                                  2 5

                                                                  0 62

                                                                  A normal log record should be

                                                                  Otherwise something is wrong (eg task failed to complete)

                                                                  3312010 614 RD00155D3611B0 Executing the task 251523

                                                                  3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                  3312010 625 RD00155D3611B0 Executing the task 251553

                                                                  3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                  3312010 644 RD00155D3611B0 Executing the task 251600

                                                                  3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                  3312010 822 RD00155D3611B0 Executing the task 251774

                                                                  3312010 950 RD00155D3611B0 Executing the task 251895

                                                                  3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                  North Europe Data Center totally 34256 tasks processed

                                                                  All 62 compute nodes lost tasks

                                                                  and then came back in a group

                                                                  This is an Update domain

                                                                  ~30 mins

                                                                  ~ 6 nodes in one group

                                                                  35 Nodes experience blob

                                                                  writing failure at same

                                                                  time

                                                                  West Europe Datacenter 30976 tasks are completed and job was killed

                                                                  A reasonable guess the

                                                                  Fault Domain is working

                                                                  MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                  You never miss the water till the well has run dry Irish Proverb

                                                                  ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                  Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                  λv = Latent heat of vaporization (Jg)

                                                                  Rn = Net radiation (W m-2)

                                                                  cp = Specific heat capacity of air (J kg-1 K-1)

                                                                  ρa = dry air density (kg m-3)

                                                                  δq = vapor pressure deficit (Pa)

                                                                  ga = Conductivity of air (inverse of ra) (m s-1)

                                                                  gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                  γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                  Estimating resistanceconductivity across a

                                                                  catchment can be tricky

                                                                  bull Lots of inputs big data reduction

                                                                  bull Some of the inputs are not so simple

                                                                  119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                  (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                  Penman-Monteith (1964)

                                                                  Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                  NASA MODIS

                                                                  imagery source

                                                                  archives

                                                                  5 TB (600K files)

                                                                  FLUXNET curated

                                                                  sensor dataset

                                                                  (30GB 960 files)

                                                                  FLUXNET curated

                                                                  field dataset

                                                                  2 KB (1 file)

                                                                  NCEPNCAR

                                                                  ~100MB

                                                                  (4K files)

                                                                  Vegetative

                                                                  clumping

                                                                  ~5MB (1file)

                                                                  Climate

                                                                  classification

                                                                  ~1MB (1file)

                                                                  20 US year = 1 global year

                                                                  Data collection (map) stage

                                                                  bull Downloads requested input tiles

                                                                  from NASA ftp sites

                                                                  bull Includes geospatial lookup for

                                                                  non-sinusoidal tiles that will

                                                                  contribute to a reprojected

                                                                  sinusoidal tile

                                                                  Reprojection (map) stage

                                                                  bull Converts source tile(s) to

                                                                  intermediate result sinusoidal tiles

                                                                  bull Simple nearest neighbor or spline

                                                                  algorithms

                                                                  Derivation reduction stage

                                                                  bull First stage visible to scientist

                                                                  bull Computes ET in our initial use

                                                                  Analysis reduction stage

                                                                  bull Optional second stage visible to

                                                                  scientist

                                                                  bull Enables production of science

                                                                  analysis artifacts such as maps

                                                                  tables virtual sensors

                                                                  Reduction 1

                                                                  Queue

                                                                  Source

                                                                  Metadata

                                                                  AzureMODIS

                                                                  Service Web Role Portal

                                                                  Request

                                                                  Queue

                                                                  Scientific

                                                                  Results

                                                                  Download

                                                                  Data Collection Stage

                                                                  Source Imagery Download Sites

                                                                  Reprojection

                                                                  Queue

                                                                  Reduction 2

                                                                  Queue

                                                                  Download

                                                                  Queue

                                                                  Scientists

                                                                  Science results

                                                                  Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                  httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                  bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                  bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                  bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                  recoverable units of work

                                                                  bull Execution status of all jobs and tasks persisted in Tables

                                                                  ltPipelineStagegt

                                                                  Request

                                                                  hellip ltPipelineStagegtJobStatus

                                                                  Persist ltPipelineStagegtJob Queue

                                                                  MODISAzure Service

                                                                  (Web Role)

                                                                  Service Monitor

                                                                  (Worker Role)

                                                                  Parse amp Persist ltPipelineStagegtTaskStatus

                                                                  hellip

                                                                  Dispatch

                                                                  ltPipelineStagegtTask Queue

                                                                  All work actually done by a Worker Role

                                                                  bull Sandboxes science or other executable

                                                                  bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                  Service Monitor

                                                                  (Worker Role)

                                                                  Parse amp Persist ltPipelineStagegtTaskStatus

                                                                  GenericWorker

                                                                  (Worker Role)

                                                                  hellip

                                                                  hellip

                                                                  Dispatch

                                                                  ltPipelineStagegtTask Queue

                                                                  hellip

                                                                  ltInputgtData Storage

                                                                  bull Dequeues tasks created by the Service Monitor

                                                                  bull Retries failed tasks 3 times

                                                                  bull Maintains all task status

                                                                  Reprojection Request

                                                                  hellip

                                                                  Service Monitor

                                                                  (Worker Role)

                                                                  ReprojectionJobStatus Persist

                                                                  Parse amp Persist ReprojectionTaskStatus

                                                                  GenericWorker

                                                                  (Worker Role)

                                                                  hellip

                                                                  Job Queue

                                                                  hellip

                                                                  Dispatch

                                                                  Task Queue

                                                                  Points to

                                                                  hellip

                                                                  ScanTimeList

                                                                  SwathGranuleMeta

                                                                  Reprojection Data

                                                                  Storage

                                                                  Each entity specifies a

                                                                  single reprojection job

                                                                  request

                                                                  Each entity specifies a

                                                                  single reprojection task (ie

                                                                  a single tile)

                                                                  Query this table to get

                                                                  geo-metadata (eg

                                                                  boundaries) for each swath

                                                                  tile

                                                                  Query this table to get the

                                                                  list of satellite scan times

                                                                  that cover a target tile

                                                                  Swath Source

                                                                  Data Storage

                                                                  bull Computational costs driven by data scale and need to run reduction multiple times

                                                                  bull Storage costs driven by data scale and 6 month project duration

                                                                  bull Small with respect to the people costs even at graduate student rates

                                                                  Reduction 1 Queue

                                                                  Source Metadata

                                                                  Request Queue

                                                                  Scientific Results Download

                                                                  Data Collection Stage

                                                                  Source Imagery Download Sites

                                                                  Reprojection Queue

                                                                  Reduction 2 Queue

                                                                  Download

                                                                  Queue

                                                                  Scientists

                                                                  Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                  400-500 GB

                                                                  60K files

                                                                  10 MBsec

                                                                  11 hours

                                                                  lt10 workers

                                                                  $50 upload

                                                                  $450 storage

                                                                  400 GB

                                                                  45K files

                                                                  3500 hours

                                                                  20-100

                                                                  workers

                                                                  5-7 GB

                                                                  55K files

                                                                  1800 hours

                                                                  20-100

                                                                  workers

                                                                  lt10 GB

                                                                  ~1K files

                                                                  1800 hours

                                                                  20-100

                                                                  workers

                                                                  $420 cpu

                                                                  $60 download

                                                                  $216 cpu

                                                                  $1 download

                                                                  $6 storage

                                                                  $216 cpu

                                                                  $2 download

                                                                  $9 storage

                                                                  AzureMODIS

                                                                  Service Web Role Portal

                                                                  Total $1420

                                                                  bull Clouds are the largest scale computer centers ever constructed and have the

                                                                  potential to be important to both large and small scale science problems

                                                                  bull Equally import they can increase participation in research providing needed

                                                                  resources to userscommunities without ready access

                                                                  bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                  support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                  latency applications do not perform optimally on clouds today

                                                                  bull Provide valuable fault tolerance and scalability abstractions

                                                                  bull Clouds as amplifier for familiar client tools and on premise compute

                                                                  bull Clouds services to support research provide considerable leverage for both

                                                                  individual researchers and entire communities of researchers

                                                                  • Day 2 - Azure as a PaaS
                                                                  • Day 2 - Applications

                                                                    Partition Keys In Each Abstraction

                                                                    bull Entities w same PartitionKey value served from same partition Entities ndash TableName + PartitionKey

                                                                    PartitionKey (CustomerId) RowKey (RowKind)

                                                                    Name CreditCardNumber OrderTotal

                                                                    1 Customer-John Smith John Smith xxxx-xxxx-xxxx-xxxx

                                                                    1 Order ndash 1 $3512

                                                                    2 Customer-Bill Johnson Bill Johnson xxxx-xxxx-xxxx-xxxx

                                                                    2 Order ndash 3 $1000

                                                                    bull Every blob and its snapshots are in a single partition Blobs ndash Container name + Blob name

                                                                    bullAll messages for a single queue belong to the same partition Messages ndash Queue Name

                                                                    Container Name Blob Name

                                                                    image annarborbighousejpg

                                                                    image foxboroughgillettejpg

                                                                    video annarborbighousejpg

                                                                    Queue Message

                                                                    jobs Message1

                                                                    jobs Message2

                                                                    workflow Message1

                                                                    Scalability Targets

                                                                    Storage Account

                                                                    bull Capacity ndash Up to 100 TBs

                                                                    bull Transactions ndash Up to a few thousand requests per second

                                                                    bull Bandwidth ndash Up to a few hundred megabytes per second

                                                                    Single QueueTable Partition

                                                                    bull Up to 500 transactions per second

                                                                    To go above these numbers partition between multiple storage accounts and partitions

                                                                    When limit is hit app will see lsquo503 server busyrsquo applications should implement exponential backoff

                                                                    Single Blob Partition

                                                                    bull Throughput up to 60 MBs

                                                                    PartitionKey (Category)

                                                                    RowKey (Title)

                                                                    Timestamp ReleaseDate

                                                                    Action Fast amp Furious hellip 2009

                                                                    Action The Bourne Ultimatum hellip 2007

                                                                    hellip hellip hellip hellip

                                                                    Animation Open Season 2 hellip 2009

                                                                    Animation The Ant Bully hellip 2006

                                                                    PartitionKey (Category)

                                                                    RowKey (Title)

                                                                    Timestamp ReleaseDate

                                                                    Comedy Office Space hellip 1999

                                                                    hellip hellip hellip hellip

                                                                    SciFi X-Men Origins Wolverine hellip 2009

                                                                    hellip hellip hellip hellip

                                                                    War Defiance hellip 2008

                                                                    PartitionKey (Category)

                                                                    RowKey (Title)

                                                                    Timestamp ReleaseDate

                                                                    Action Fast amp Furious hellip 2009

                                                                    Action The Bourne Ultimatum hellip 2007

                                                                    hellip hellip hellip hellip

                                                                    Animation Open Season 2 hellip 2009

                                                                    Animation The Ant Bully hellip 2006

                                                                    hellip hellip hellip hellip

                                                                    Comedy Office Space hellip 1999

                                                                    hellip hellip hellip hellip

                                                                    SciFi X-Men Origins Wolverine hellip 2009

                                                                    hellip hellip hellip hellip

                                                                    War Defiance hellip 2008

                                                                    Partitions and Partition Ranges

                                                                    Key Selection Things to Consider

                                                                    bullDistribute load as much as possible bullHot partitions can be load balanced bullPartitionKey is critical for scalability

                                                                    See httpwwwmicrosoftpdccom2009SVC09 and httpazurescopecloudappnet for more information

                                                                    bull Avoid frequent large scans bull Parallelize queries bull Point queries are most efficient

                                                                    bullTransactions across a single partition bullTransaction semantics amp Reduce round trips

                                                                    Scalability

                                                                    Query Efficiency amp Speed

                                                                    Entity group transactions

                                                                    Expect Continuation Tokens ndash Seriously

                                                                    Maximum of 1000 rows in a response

                                                                    At the end of partition range boundary

                                                                    Maximum of 1000 rows in a response

                                                                    At the end of partition range boundary

                                                                    Maximum of 5 seconds to execute the query

                                                                    Tables Recap bullEfficient for frequently used queries

                                                                    bullSupports batch transactions

                                                                    bullDistributes load

                                                                    Select PartitionKey and RowKey that help scale

                                                                    Avoid ldquoAppend onlyrdquo patterns

                                                                    Always Handle continuation tokens

                                                                    ldquoORrdquo predicates are not optimized

                                                                    Implement back-off strategy for retries

                                                                    bullDistribute by using a hash etc as prefix

                                                                    bullExpect continuation tokens for range queries

                                                                    bullExecute the queries that form the ldquoORrdquo predicates as separate queries

                                                                    bullServer busy

                                                                    bullLoad balance partitions to meet traffic needs

                                                                    bullLoad on single partition has exceeded the limits

                                                                    WCF Data Services

                                                                    bullUse a new context for each logical operation

                                                                    bullAddObjectAttachTo can throw exception if entity is already being tracked

                                                                    bullPoint query throws an exception if resource does not exist Use IgnoreResourceNotFoundException

                                                                    Queues Their Unique Role in Building Reliable Scalable Applications

                                                                    bull Want roles that work closely together but are not bound together bull Tight coupling leads to brittleness

                                                                    bull This can aid in scaling and performance

                                                                    bull A queue can hold an unlimited number of messages bull Messages must be serializable as XML

                                                                    bull Limited to 8KB in size

                                                                    bull Commonly use the work ticket pattern

                                                                    bull Why not simply use a table

                                                                    Queue Terminology

                                                                    Message Lifecycle

                                                                    Queue

                                                                    Msg 1

                                                                    Msg 2

                                                                    Msg 3

                                                                    Msg 4

                                                                    Worker Role

                                                                    Worker Role

                                                                    PutMessage

                                                                    Web Role

                                                                    GetMessage (Timeout) RemoveMessage

                                                                    Msg 2 Msg 1

                                                                    Worker Role

                                                                    Msg 2

                                                                    POST httpmyaccountqueuecorewindowsnetmyqueuemessages

                                                                    HTTP11 200 OK Transfer-Encoding chunked Content-Type applicationxml Date Tue 09 Dec 2008 210430 GMT Server Nephos Queue Service Version 10 Microsoft-HTTPAPI20

                                                                    ltxml version=10 encoding=utf-8gt ltQueueMessagesListgt ltQueueMessagegt ltMessageIdgt5974b586-0df3-4e2d-ad0c-18e3892bfca2ltMessageIdgt ltInsertionTimegtMon 22 Sep 2008 232920 GMTltInsertionTimegt ltExpirationTimegtMon 29 Sep 2008 232920 GMTltExpirationTimegt ltPopReceiptgtYzQ4Yzg1MDIGM0MDFiZDAwYzEwltPopReceiptgt ltTimeNextVisiblegtTue 23 Sep 2008 052920GMTltTimeNextVisiblegt ltMessageTextgtPHRlc3Q+dGdGVzdD4=ltMessageTextgt ltQueueMessagegt ltQueueMessagesListgt

                                                                    DELETE httpmyaccountqueuecorewindowsnetmyqueuemessagesmessageidpopreceipt=YzQ4Yzg1MDIGM0MDFiZDAwYzEw

                                                                    Truncated Exponential Back Off Polling

                                                                    Consider a backoff polling approach Each empty poll

                                                                    increases interval by 2x

                                                                    A successful sets the interval back to 1

                                                                    44

                                                                    2 1

                                                                    1 1

                                                                    C1

                                                                    C2

                                                                    Removing Poison Messages

                                                                    1 1

                                                                    2 1

                                                                    3 4 0

                                                                    Producers Consumers

                                                                    P2

                                                                    P1

                                                                    3 0

                                                                    2 GetMessage(Q 30 s) msg 2

                                                                    1 GetMessage(Q 30 s) msg 1

                                                                    1 1

                                                                    2 1

                                                                    1 0

                                                                    2 0

                                                                    45

                                                                    C1

                                                                    C2

                                                                    Removing Poison Messages

                                                                    3 4 0

                                                                    Producers Consumers

                                                                    P2

                                                                    P1

                                                                    1 1

                                                                    2 1

                                                                    2 GetMessage(Q 30 s) msg 2 3 C2 consumed msg 2 4 DeleteMessage(Q msg 2) 7 GetMessage(Q 30 s) msg 1

                                                                    1 GetMessage(Q 30 s) msg 1 5 C1 crashed

                                                                    1 1

                                                                    2 1

                                                                    6 msg1 visible 30 s after Dequeue 3 0

                                                                    1 2

                                                                    1 1

                                                                    1 2

                                                                    46

                                                                    C1

                                                                    C2

                                                                    Removing Poison Messages

                                                                    3 4 0

                                                                    Producers Consumers

                                                                    P2

                                                                    P1

                                                                    1 2

                                                                    2 Dequeue(Q 30 sec) msg 2 3 C2 consumed msg 2 4 Delete(Q msg 2) 7 Dequeue(Q 30 sec) msg 1 8 C2 crashed

                                                                    1 Dequeue(Q 30 sec) msg 1 5 C1 crashed 10 C1 restarted 11 Dequeue(Q 30 sec) msg 1 12 DequeueCount gt 2 13 Delete (Q msg1) 1

                                                                    2

                                                                    6 msg1 visible 30s after Dequeue 9 msg1 visible 30s after Dequeue

                                                                    3 0

                                                                    1 3

                                                                    1 2

                                                                    1 3

                                                                    Queues Recap

                                                                    bullNo need to deal with failures Make message

                                                                    processing idempotent

                                                                    bullInvisible messages result in out of order Do not rely on order

                                                                    bullEnforce threshold on messagersquos dequeue count Use Dequeue count to remove

                                                                    poison messages

                                                                    bullMessages gt 8KB

                                                                    bullBatch messages

                                                                    bullGarbage collect orphaned blobs

                                                                    bullDynamically increasereduce workers

                                                                    Use blob to store message data with

                                                                    reference in message

                                                                    Use message count to scale

                                                                    bullNo need to deal with failures

                                                                    bullInvisible messages result in out of order

                                                                    bullEnforce threshold on messagersquos dequeue count

                                                                    bullDynamically increasereduce workers

                                                                    Windows Azure Storage Takeaways

                                                                    Blobs

                                                                    Drives

                                                                    Tables

                                                                    Queues

                                                                    httpblogsmsdncomwindowsazurestorage

                                                                    httpazurescopecloudappnet

                                                                    49

                                                                    A Quick Exercise

                                                                    hellipThen letrsquos look at some code and some tools

                                                                    50

                                                                    Code ndash AccountInformationcs public class AccountInformation private static string storageKey = ldquotHiSiSnOtMyKeY private static string accountName = jjstore private static StorageCredentialsAccountAndKey credentials internal static StorageCredentialsAccountAndKey Credentials get if (credentials == null) credentials = new StorageCredentialsAccountAndKey(accountName storageKey) return credentials

                                                                    51

                                                                    Code ndash BlobHelpercs public class BlobHelper private static string defaultContainerName = school private CloudBlobClient client = null private CloudBlobContainer container = null private void InitContainer() if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudBlobClient() container = clientGetContainerReference(defaultContainerName) containerCreateIfNotExist() BlobContainerPermissions permissions = containerGetPermissions() permissionsPublicAccess = BlobContainerPublicAccessTypeContainer containerSetPermissions(permissions)

                                                                    52

                                                                    Code ndash BlobHelpercs

                                                                    public void WriteFileToBlob(string filePath) if (client == null || container == null) InitContainer() FileInfo file = new FileInfo(filePath) CloudBlob blob = containerGetBlobReference(fileName) blobPropertiesContentType = GetContentType(fileExtension) blobUploadFile(fileFullName) Or if you want to write a string replace the last line with blobUploadText(someString) And make sure you set the content type to the appropriate MIME type (eg ldquotextplainrdquo)

                                                                    53

                                                                    Code ndash BlobHelpercs

                                                                    public string GetBlobText(string blobName) if (client == null || container == null) InitContainer() CloudBlob blob = containerGetBlobReference(blobName) try return blobDownloadText() catch (Exception) The blob probably does not exist or there is no connection available return null

                                                                    54

                                                                    Application Code - Blobs private void SaveToCloudButton_Click(object sender RoutedEventArgs e) StringBuilder buff = new StringBuilder() buffAppendLine(LastNameFirstNameEmailBirthdayNativeLanguageFavoriteIceCreamYearsInPhDGraduated) foreach (AttendeeEntity attendee in attendees) buffAppendLine(attendeeToCsvString()) blobHelperWriteStringToBlob(SummerSchoolAttendeestxt buffToString())

                                                                    The blob is now available at httpltAccountNamegtblobcorewindowsnetltContainerNamegtltBlobNamegt Or in this case httpjjstoreblobcorewindowsnetschoolSummerSchoolAttendeestxt

                                                                    55

                                                                    Code - TableEntities using MicrosoftWindowsAzureStorageClient public class AttendeeEntity TableServiceEntity public string FirstName get set public string LastName get set public string Email get set public DateTime Birthday get set public string FavoriteIceCream get set public int YearsInPhD get set public bool Graduated get set hellip

                                                                    56

                                                                    Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

                                                                    57

                                                                    Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

                                                                    58

                                                                    Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

                                                                    59

                                                                    Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

                                                                    60

                                                                    Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

                                                                    Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

                                                                    61

                                                                    Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

                                                                    62

                                                                    Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

                                                                    Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

                                                                    63

                                                                    Tools ndash Fiddler2

                                                                    Best Practices

                                                                    Picking the Right VM Size

                                                                    bull Having the correct VM size can make a big difference in costs

                                                                    bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                                                                    bull If you scale better than linear across cores larger VMs could save you money

                                                                    bull Pretty rare to see linear scaling across 8 cores

                                                                    bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                                                                    bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                                                                    Using Your VM to the Maximum

                                                                    Remember bull 1 role instance == 1 VM running Windows

                                                                    bull 1 role instance = one specific task for your code

                                                                    bull Yoursquore paying for the entire VM so why not use it

                                                                    bull Common mistake ndash split up code into multiple roles each not using up CPU

                                                                    bull Balance between using up CPU vs having free capacity in times of need

                                                                    bull Multiple ways to use your CPU to the fullest

                                                                    Exploiting Concurrency

                                                                    bull Spin up additional processes each with a specific task or as a unit of concurrency

                                                                    bull May not be ideal if number of active processes exceeds number of cores

                                                                    bull Use multithreading aggressively

                                                                    bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                                                                    bull In NET 4 use the Task Parallel Library

                                                                    bull Data parallelism

                                                                    bull Task parallelism

                                                                    Finding Good Code Neighbors

                                                                    bull Typically code falls into one or more of these categories

                                                                    bull Find code that is intensive with different resources to live together

                                                                    bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                                                                    Memory Intensive

                                                                    CPU Intensive

                                                                    Network IO Intensive

                                                                    Storage IO Intensive

                                                                    Scaling Appropriately

                                                                    bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                                                                    bull Spinning VMs up and down automatically is good at large scale

                                                                    bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                                                                    bull Being too aggressive in spinning down VMs can result in poor user experience

                                                                    bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                                                                    Performance Cost

                                                                    Storage Costs

                                                                    bull Understand an applicationrsquos storage profile and how storage billing works

                                                                    bull Make service choices based on your app profile

                                                                    bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                                                                    bull Service choice can make a big cost difference based on your app profile

                                                                    bull Caching and compressing They help a lot with storage costs

                                                                    Saving Bandwidth Costs

                                                                    Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                                                    Sending fewer things over the wire often means getting fewer things from storage

                                                                    Saving bandwidth costs often lead to savings in other places

                                                                    Sending fewer things means your VM has time to do other tasks

                                                                    All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                                                    Compressing Content

                                                                    1 Gzip all output content

                                                                    bull All modern browsers can decompress on the fly

                                                                    bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                                                    2Tradeoff compute costs for storage size

                                                                    3Minimize image sizes

                                                                    bull Use Portable Network Graphics (PNGs)

                                                                    bull Crush your PNGs

                                                                    bull Strip needless metadata

                                                                    bull Make all PNGs palette PNGs

                                                                    Uncompressed

                                                                    Content

                                                                    Compressed

                                                                    Content

                                                                    Gzip

                                                                    Minify JavaScript

                                                                    Minify CCS

                                                                    Minify Images

                                                                    Best Practices Summary

                                                                    Doing lsquolessrsquo is the key to saving costs

                                                                    Measure everything

                                                                    Know your application profile in and out

                                                                    Research Examples in the Cloud

                                                                    hellipon another set of slides

                                                                    Map Reduce on Azure

                                                                    bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                                                    bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                                                    76

                                                                    Project Daytona - Map Reduce on Azure

                                                                    httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                                                    77

                                                                    Questions and Discussionhellip

                                                                    Thank you for hosting me at the Summer School

                                                                    BLAST (Basic Local Alignment Search Tool)

                                                                    bull The most important software in bioinformatics

                                                                    bull Identify similarity between bio-sequences

                                                                    Computationally intensive

                                                                    bull Large number of pairwise alignment operations

                                                                    bull A BLAST running can take 700 ~ 1000 CPU hours

                                                                    bull Sequence databases growing exponentially

                                                                    bull GenBank doubled in size in about 15 months

                                                                    It is easy to parallelize BLAST

                                                                    bull Segment the input

                                                                    bull Segment processing (querying) is pleasingly parallel

                                                                    bull Segment the database (eg mpiBLAST)

                                                                    bull Needs special result reduction processing

                                                                    Large volume data

                                                                    bull A normal Blast database can be as large as 10GB

                                                                    bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                                    bull The output of BLAST is usually 10-100x larger than the input

                                                                    bull Parallel BLAST engine on Azure

                                                                    bull Query-segmentation data-parallel pattern bull split the input sequences

                                                                    bull query partitions in parallel

                                                                    bull merge results together when done

                                                                    bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                                    bull With three special considerations bull Batch job management

                                                                    bull Task parallelism on an elastic Cloud

                                                                    Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                                    on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                                    A simple SplitJoin pattern

                                                                    Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                                    bull 1248 for small middle large and extra large instance size

                                                                    Task granularity bull Large partition load imbalance

                                                                    bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                                    bull Data transferring overhead

                                                                    Best Practice test runs to profiling and set size to mitigate the overhead

                                                                    Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                                    bull too small repeated computation

                                                                    bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                                    bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                                    bull Watch out for the 2-hour maximum limitation

                                                                    BLAST task

                                                                    Splitting task

                                                                    BLAST task

                                                                    BLAST task

                                                                    BLAST task

                                                                    hellip

                                                                    Merging Task

                                                                    Task size vs Performance

                                                                    bull Benefit of the warm cache effect

                                                                    bull 100 sequences per partition is the best choice

                                                                    Instance size vs Performance

                                                                    bull Super-linear speedup with larger size worker instances

                                                                    bull Primarily due to the memory capability

                                                                    Task SizeInstance Size vs Cost

                                                                    bull Extra-large instance generated the best and the most economical throughput

                                                                    bull Fully utilize the resource

                                                                    Web

                                                                    Portal

                                                                    Web

                                                                    Service

                                                                    Job registration

                                                                    Job Scheduler

                                                                    Worker

                                                                    Worker

                                                                    Worker

                                                                    Global

                                                                    dispatch

                                                                    queue

                                                                    Web Role

                                                                    Azure Table

                                                                    Job Management Role

                                                                    Azure Blob

                                                                    Database

                                                                    updating Role

                                                                    hellip

                                                                    Scaling Engine

                                                                    Blast databases

                                                                    temporary data etc)

                                                                    Job Registry NCBI databases

                                                                    BLAST task

                                                                    Splitting task

                                                                    BLAST task

                                                                    BLAST task

                                                                    BLAST task

                                                                    hellip

                                                                    Merging Task

                                                                    ASPNET program hosted by a web role instance bull Submit jobs

                                                                    bull Track jobrsquos status and logs

                                                                    AuthenticationAuthorization based on Live ID

                                                                    The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                                    Web Portal

                                                                    Web Service

                                                                    Job registration

                                                                    Job Scheduler

                                                                    Job Portal

                                                                    Scaling Engine

                                                                    Job Registry

                                                                    R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                                    Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                                    bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                                    AzureBLAST significantly saved computing timehellip

                                                                    Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                                    ldquoAll against Allrdquo query bull The database is also the input query

                                                                    bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                                    bull Theoretically 100 billion sequence comparisons

                                                                    Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                                    bull Would require 3216731 minutes (61 years) on one desktop

                                                                    One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                                    bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                    bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                    bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                    bull Each segment consists of smaller partitions

                                                                    bull When load imbalances redistribute the load manually

                                                                    5

                                                                    0

                                                                    62 6

                                                                    2 6

                                                                    2 6

                                                                    2 6

                                                                    2 5

                                                                    0 62

                                                                    bull Total size of the output result is ~230GB

                                                                    bull The number of total hits is 1764579487

                                                                    bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                    bull But based our estimates real working instance time should be 6~8 day

                                                                    bull Look into log data to analyze what took placehellip

                                                                    5

                                                                    0

                                                                    62 6

                                                                    2 6

                                                                    2 6

                                                                    2 6

                                                                    2 5

                                                                    0 62

                                                                    A normal log record should be

                                                                    Otherwise something is wrong (eg task failed to complete)

                                                                    3312010 614 RD00155D3611B0 Executing the task 251523

                                                                    3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                    3312010 625 RD00155D3611B0 Executing the task 251553

                                                                    3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                    3312010 644 RD00155D3611B0 Executing the task 251600

                                                                    3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                    3312010 822 RD00155D3611B0 Executing the task 251774

                                                                    3312010 950 RD00155D3611B0 Executing the task 251895

                                                                    3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                    North Europe Data Center totally 34256 tasks processed

                                                                    All 62 compute nodes lost tasks

                                                                    and then came back in a group

                                                                    This is an Update domain

                                                                    ~30 mins

                                                                    ~ 6 nodes in one group

                                                                    35 Nodes experience blob

                                                                    writing failure at same

                                                                    time

                                                                    West Europe Datacenter 30976 tasks are completed and job was killed

                                                                    A reasonable guess the

                                                                    Fault Domain is working

                                                                    MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                    You never miss the water till the well has run dry Irish Proverb

                                                                    ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                    Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                    λv = Latent heat of vaporization (Jg)

                                                                    Rn = Net radiation (W m-2)

                                                                    cp = Specific heat capacity of air (J kg-1 K-1)

                                                                    ρa = dry air density (kg m-3)

                                                                    δq = vapor pressure deficit (Pa)

                                                                    ga = Conductivity of air (inverse of ra) (m s-1)

                                                                    gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                    γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                    Estimating resistanceconductivity across a

                                                                    catchment can be tricky

                                                                    bull Lots of inputs big data reduction

                                                                    bull Some of the inputs are not so simple

                                                                    119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                    (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                    Penman-Monteith (1964)

                                                                    Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                    NASA MODIS

                                                                    imagery source

                                                                    archives

                                                                    5 TB (600K files)

                                                                    FLUXNET curated

                                                                    sensor dataset

                                                                    (30GB 960 files)

                                                                    FLUXNET curated

                                                                    field dataset

                                                                    2 KB (1 file)

                                                                    NCEPNCAR

                                                                    ~100MB

                                                                    (4K files)

                                                                    Vegetative

                                                                    clumping

                                                                    ~5MB (1file)

                                                                    Climate

                                                                    classification

                                                                    ~1MB (1file)

                                                                    20 US year = 1 global year

                                                                    Data collection (map) stage

                                                                    bull Downloads requested input tiles

                                                                    from NASA ftp sites

                                                                    bull Includes geospatial lookup for

                                                                    non-sinusoidal tiles that will

                                                                    contribute to a reprojected

                                                                    sinusoidal tile

                                                                    Reprojection (map) stage

                                                                    bull Converts source tile(s) to

                                                                    intermediate result sinusoidal tiles

                                                                    bull Simple nearest neighbor or spline

                                                                    algorithms

                                                                    Derivation reduction stage

                                                                    bull First stage visible to scientist

                                                                    bull Computes ET in our initial use

                                                                    Analysis reduction stage

                                                                    bull Optional second stage visible to

                                                                    scientist

                                                                    bull Enables production of science

                                                                    analysis artifacts such as maps

                                                                    tables virtual sensors

                                                                    Reduction 1

                                                                    Queue

                                                                    Source

                                                                    Metadata

                                                                    AzureMODIS

                                                                    Service Web Role Portal

                                                                    Request

                                                                    Queue

                                                                    Scientific

                                                                    Results

                                                                    Download

                                                                    Data Collection Stage

                                                                    Source Imagery Download Sites

                                                                    Reprojection

                                                                    Queue

                                                                    Reduction 2

                                                                    Queue

                                                                    Download

                                                                    Queue

                                                                    Scientists

                                                                    Science results

                                                                    Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                    httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                    bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                    bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                    bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                    recoverable units of work

                                                                    bull Execution status of all jobs and tasks persisted in Tables

                                                                    ltPipelineStagegt

                                                                    Request

                                                                    hellip ltPipelineStagegtJobStatus

                                                                    Persist ltPipelineStagegtJob Queue

                                                                    MODISAzure Service

                                                                    (Web Role)

                                                                    Service Monitor

                                                                    (Worker Role)

                                                                    Parse amp Persist ltPipelineStagegtTaskStatus

                                                                    hellip

                                                                    Dispatch

                                                                    ltPipelineStagegtTask Queue

                                                                    All work actually done by a Worker Role

                                                                    bull Sandboxes science or other executable

                                                                    bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                    Service Monitor

                                                                    (Worker Role)

                                                                    Parse amp Persist ltPipelineStagegtTaskStatus

                                                                    GenericWorker

                                                                    (Worker Role)

                                                                    hellip

                                                                    hellip

                                                                    Dispatch

                                                                    ltPipelineStagegtTask Queue

                                                                    hellip

                                                                    ltInputgtData Storage

                                                                    bull Dequeues tasks created by the Service Monitor

                                                                    bull Retries failed tasks 3 times

                                                                    bull Maintains all task status

                                                                    Reprojection Request

                                                                    hellip

                                                                    Service Monitor

                                                                    (Worker Role)

                                                                    ReprojectionJobStatus Persist

                                                                    Parse amp Persist ReprojectionTaskStatus

                                                                    GenericWorker

                                                                    (Worker Role)

                                                                    hellip

                                                                    Job Queue

                                                                    hellip

                                                                    Dispatch

                                                                    Task Queue

                                                                    Points to

                                                                    hellip

                                                                    ScanTimeList

                                                                    SwathGranuleMeta

                                                                    Reprojection Data

                                                                    Storage

                                                                    Each entity specifies a

                                                                    single reprojection job

                                                                    request

                                                                    Each entity specifies a

                                                                    single reprojection task (ie

                                                                    a single tile)

                                                                    Query this table to get

                                                                    geo-metadata (eg

                                                                    boundaries) for each swath

                                                                    tile

                                                                    Query this table to get the

                                                                    list of satellite scan times

                                                                    that cover a target tile

                                                                    Swath Source

                                                                    Data Storage

                                                                    bull Computational costs driven by data scale and need to run reduction multiple times

                                                                    bull Storage costs driven by data scale and 6 month project duration

                                                                    bull Small with respect to the people costs even at graduate student rates

                                                                    Reduction 1 Queue

                                                                    Source Metadata

                                                                    Request Queue

                                                                    Scientific Results Download

                                                                    Data Collection Stage

                                                                    Source Imagery Download Sites

                                                                    Reprojection Queue

                                                                    Reduction 2 Queue

                                                                    Download

                                                                    Queue

                                                                    Scientists

                                                                    Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                    400-500 GB

                                                                    60K files

                                                                    10 MBsec

                                                                    11 hours

                                                                    lt10 workers

                                                                    $50 upload

                                                                    $450 storage

                                                                    400 GB

                                                                    45K files

                                                                    3500 hours

                                                                    20-100

                                                                    workers

                                                                    5-7 GB

                                                                    55K files

                                                                    1800 hours

                                                                    20-100

                                                                    workers

                                                                    lt10 GB

                                                                    ~1K files

                                                                    1800 hours

                                                                    20-100

                                                                    workers

                                                                    $420 cpu

                                                                    $60 download

                                                                    $216 cpu

                                                                    $1 download

                                                                    $6 storage

                                                                    $216 cpu

                                                                    $2 download

                                                                    $9 storage

                                                                    AzureMODIS

                                                                    Service Web Role Portal

                                                                    Total $1420

                                                                    bull Clouds are the largest scale computer centers ever constructed and have the

                                                                    potential to be important to both large and small scale science problems

                                                                    bull Equally import they can increase participation in research providing needed

                                                                    resources to userscommunities without ready access

                                                                    bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                    support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                    latency applications do not perform optimally on clouds today

                                                                    bull Provide valuable fault tolerance and scalability abstractions

                                                                    bull Clouds as amplifier for familiar client tools and on premise compute

                                                                    bull Clouds services to support research provide considerable leverage for both

                                                                    individual researchers and entire communities of researchers

                                                                    • Day 2 - Azure as a PaaS
                                                                    • Day 2 - Applications

                                                                      Scalability Targets

                                                                      Storage Account

                                                                      bull Capacity ndash Up to 100 TBs

                                                                      bull Transactions ndash Up to a few thousand requests per second

                                                                      bull Bandwidth ndash Up to a few hundred megabytes per second

                                                                      Single QueueTable Partition

                                                                      bull Up to 500 transactions per second

                                                                      To go above these numbers partition between multiple storage accounts and partitions

                                                                      When limit is hit app will see lsquo503 server busyrsquo applications should implement exponential backoff

                                                                      Single Blob Partition

                                                                      bull Throughput up to 60 MBs

                                                                      PartitionKey (Category)

                                                                      RowKey (Title)

                                                                      Timestamp ReleaseDate

                                                                      Action Fast amp Furious hellip 2009

                                                                      Action The Bourne Ultimatum hellip 2007

                                                                      hellip hellip hellip hellip

                                                                      Animation Open Season 2 hellip 2009

                                                                      Animation The Ant Bully hellip 2006

                                                                      PartitionKey (Category)

                                                                      RowKey (Title)

                                                                      Timestamp ReleaseDate

                                                                      Comedy Office Space hellip 1999

                                                                      hellip hellip hellip hellip

                                                                      SciFi X-Men Origins Wolverine hellip 2009

                                                                      hellip hellip hellip hellip

                                                                      War Defiance hellip 2008

                                                                      PartitionKey (Category)

                                                                      RowKey (Title)

                                                                      Timestamp ReleaseDate

                                                                      Action Fast amp Furious hellip 2009

                                                                      Action The Bourne Ultimatum hellip 2007

                                                                      hellip hellip hellip hellip

                                                                      Animation Open Season 2 hellip 2009

                                                                      Animation The Ant Bully hellip 2006

                                                                      hellip hellip hellip hellip

                                                                      Comedy Office Space hellip 1999

                                                                      hellip hellip hellip hellip

                                                                      SciFi X-Men Origins Wolverine hellip 2009

                                                                      hellip hellip hellip hellip

                                                                      War Defiance hellip 2008

                                                                      Partitions and Partition Ranges

                                                                      Key Selection Things to Consider

                                                                      bullDistribute load as much as possible bullHot partitions can be load balanced bullPartitionKey is critical for scalability

                                                                      See httpwwwmicrosoftpdccom2009SVC09 and httpazurescopecloudappnet for more information

                                                                      bull Avoid frequent large scans bull Parallelize queries bull Point queries are most efficient

                                                                      bullTransactions across a single partition bullTransaction semantics amp Reduce round trips

                                                                      Scalability

                                                                      Query Efficiency amp Speed

                                                                      Entity group transactions

                                                                      Expect Continuation Tokens ndash Seriously

                                                                      Maximum of 1000 rows in a response

                                                                      At the end of partition range boundary

                                                                      Maximum of 1000 rows in a response

                                                                      At the end of partition range boundary

                                                                      Maximum of 5 seconds to execute the query

                                                                      Tables Recap bullEfficient for frequently used queries

                                                                      bullSupports batch transactions

                                                                      bullDistributes load

                                                                      Select PartitionKey and RowKey that help scale

                                                                      Avoid ldquoAppend onlyrdquo patterns

                                                                      Always Handle continuation tokens

                                                                      ldquoORrdquo predicates are not optimized

                                                                      Implement back-off strategy for retries

                                                                      bullDistribute by using a hash etc as prefix

                                                                      bullExpect continuation tokens for range queries

                                                                      bullExecute the queries that form the ldquoORrdquo predicates as separate queries

                                                                      bullServer busy

                                                                      bullLoad balance partitions to meet traffic needs

                                                                      bullLoad on single partition has exceeded the limits

                                                                      WCF Data Services

                                                                      bullUse a new context for each logical operation

                                                                      bullAddObjectAttachTo can throw exception if entity is already being tracked

                                                                      bullPoint query throws an exception if resource does not exist Use IgnoreResourceNotFoundException

                                                                      Queues Their Unique Role in Building Reliable Scalable Applications

                                                                      bull Want roles that work closely together but are not bound together bull Tight coupling leads to brittleness

                                                                      bull This can aid in scaling and performance

                                                                      bull A queue can hold an unlimited number of messages bull Messages must be serializable as XML

                                                                      bull Limited to 8KB in size

                                                                      bull Commonly use the work ticket pattern

                                                                      bull Why not simply use a table

                                                                      Queue Terminology

                                                                      Message Lifecycle

                                                                      Queue

                                                                      Msg 1

                                                                      Msg 2

                                                                      Msg 3

                                                                      Msg 4

                                                                      Worker Role

                                                                      Worker Role

                                                                      PutMessage

                                                                      Web Role

                                                                      GetMessage (Timeout) RemoveMessage

                                                                      Msg 2 Msg 1

                                                                      Worker Role

                                                                      Msg 2

                                                                      POST httpmyaccountqueuecorewindowsnetmyqueuemessages

                                                                      HTTP11 200 OK Transfer-Encoding chunked Content-Type applicationxml Date Tue 09 Dec 2008 210430 GMT Server Nephos Queue Service Version 10 Microsoft-HTTPAPI20

                                                                      ltxml version=10 encoding=utf-8gt ltQueueMessagesListgt ltQueueMessagegt ltMessageIdgt5974b586-0df3-4e2d-ad0c-18e3892bfca2ltMessageIdgt ltInsertionTimegtMon 22 Sep 2008 232920 GMTltInsertionTimegt ltExpirationTimegtMon 29 Sep 2008 232920 GMTltExpirationTimegt ltPopReceiptgtYzQ4Yzg1MDIGM0MDFiZDAwYzEwltPopReceiptgt ltTimeNextVisiblegtTue 23 Sep 2008 052920GMTltTimeNextVisiblegt ltMessageTextgtPHRlc3Q+dGdGVzdD4=ltMessageTextgt ltQueueMessagegt ltQueueMessagesListgt

                                                                      DELETE httpmyaccountqueuecorewindowsnetmyqueuemessagesmessageidpopreceipt=YzQ4Yzg1MDIGM0MDFiZDAwYzEw

                                                                      Truncated Exponential Back Off Polling

                                                                      Consider a backoff polling approach Each empty poll

                                                                      increases interval by 2x

                                                                      A successful sets the interval back to 1

                                                                      44

                                                                      2 1

                                                                      1 1

                                                                      C1

                                                                      C2

                                                                      Removing Poison Messages

                                                                      1 1

                                                                      2 1

                                                                      3 4 0

                                                                      Producers Consumers

                                                                      P2

                                                                      P1

                                                                      3 0

                                                                      2 GetMessage(Q 30 s) msg 2

                                                                      1 GetMessage(Q 30 s) msg 1

                                                                      1 1

                                                                      2 1

                                                                      1 0

                                                                      2 0

                                                                      45

                                                                      C1

                                                                      C2

                                                                      Removing Poison Messages

                                                                      3 4 0

                                                                      Producers Consumers

                                                                      P2

                                                                      P1

                                                                      1 1

                                                                      2 1

                                                                      2 GetMessage(Q 30 s) msg 2 3 C2 consumed msg 2 4 DeleteMessage(Q msg 2) 7 GetMessage(Q 30 s) msg 1

                                                                      1 GetMessage(Q 30 s) msg 1 5 C1 crashed

                                                                      1 1

                                                                      2 1

                                                                      6 msg1 visible 30 s after Dequeue 3 0

                                                                      1 2

                                                                      1 1

                                                                      1 2

                                                                      46

                                                                      C1

                                                                      C2

                                                                      Removing Poison Messages

                                                                      3 4 0

                                                                      Producers Consumers

                                                                      P2

                                                                      P1

                                                                      1 2

                                                                      2 Dequeue(Q 30 sec) msg 2 3 C2 consumed msg 2 4 Delete(Q msg 2) 7 Dequeue(Q 30 sec) msg 1 8 C2 crashed

                                                                      1 Dequeue(Q 30 sec) msg 1 5 C1 crashed 10 C1 restarted 11 Dequeue(Q 30 sec) msg 1 12 DequeueCount gt 2 13 Delete (Q msg1) 1

                                                                      2

                                                                      6 msg1 visible 30s after Dequeue 9 msg1 visible 30s after Dequeue

                                                                      3 0

                                                                      1 3

                                                                      1 2

                                                                      1 3

                                                                      Queues Recap

                                                                      bullNo need to deal with failures Make message

                                                                      processing idempotent

                                                                      bullInvisible messages result in out of order Do not rely on order

                                                                      bullEnforce threshold on messagersquos dequeue count Use Dequeue count to remove

                                                                      poison messages

                                                                      bullMessages gt 8KB

                                                                      bullBatch messages

                                                                      bullGarbage collect orphaned blobs

                                                                      bullDynamically increasereduce workers

                                                                      Use blob to store message data with

                                                                      reference in message

                                                                      Use message count to scale

                                                                      bullNo need to deal with failures

                                                                      bullInvisible messages result in out of order

                                                                      bullEnforce threshold on messagersquos dequeue count

                                                                      bullDynamically increasereduce workers

                                                                      Windows Azure Storage Takeaways

                                                                      Blobs

                                                                      Drives

                                                                      Tables

                                                                      Queues

                                                                      httpblogsmsdncomwindowsazurestorage

                                                                      httpazurescopecloudappnet

                                                                      49

                                                                      A Quick Exercise

                                                                      hellipThen letrsquos look at some code and some tools

                                                                      50

                                                                      Code ndash AccountInformationcs public class AccountInformation private static string storageKey = ldquotHiSiSnOtMyKeY private static string accountName = jjstore private static StorageCredentialsAccountAndKey credentials internal static StorageCredentialsAccountAndKey Credentials get if (credentials == null) credentials = new StorageCredentialsAccountAndKey(accountName storageKey) return credentials

                                                                      51

                                                                      Code ndash BlobHelpercs public class BlobHelper private static string defaultContainerName = school private CloudBlobClient client = null private CloudBlobContainer container = null private void InitContainer() if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudBlobClient() container = clientGetContainerReference(defaultContainerName) containerCreateIfNotExist() BlobContainerPermissions permissions = containerGetPermissions() permissionsPublicAccess = BlobContainerPublicAccessTypeContainer containerSetPermissions(permissions)

                                                                      52

                                                                      Code ndash BlobHelpercs

                                                                      public void WriteFileToBlob(string filePath) if (client == null || container == null) InitContainer() FileInfo file = new FileInfo(filePath) CloudBlob blob = containerGetBlobReference(fileName) blobPropertiesContentType = GetContentType(fileExtension) blobUploadFile(fileFullName) Or if you want to write a string replace the last line with blobUploadText(someString) And make sure you set the content type to the appropriate MIME type (eg ldquotextplainrdquo)

                                                                      53

                                                                      Code ndash BlobHelpercs

                                                                      public string GetBlobText(string blobName) if (client == null || container == null) InitContainer() CloudBlob blob = containerGetBlobReference(blobName) try return blobDownloadText() catch (Exception) The blob probably does not exist or there is no connection available return null

                                                                      54

                                                                      Application Code - Blobs private void SaveToCloudButton_Click(object sender RoutedEventArgs e) StringBuilder buff = new StringBuilder() buffAppendLine(LastNameFirstNameEmailBirthdayNativeLanguageFavoriteIceCreamYearsInPhDGraduated) foreach (AttendeeEntity attendee in attendees) buffAppendLine(attendeeToCsvString()) blobHelperWriteStringToBlob(SummerSchoolAttendeestxt buffToString())

                                                                      The blob is now available at httpltAccountNamegtblobcorewindowsnetltContainerNamegtltBlobNamegt Or in this case httpjjstoreblobcorewindowsnetschoolSummerSchoolAttendeestxt

                                                                      55

                                                                      Code - TableEntities using MicrosoftWindowsAzureStorageClient public class AttendeeEntity TableServiceEntity public string FirstName get set public string LastName get set public string Email get set public DateTime Birthday get set public string FavoriteIceCream get set public int YearsInPhD get set public bool Graduated get set hellip

                                                                      56

                                                                      Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

                                                                      57

                                                                      Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

                                                                      58

                                                                      Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

                                                                      59

                                                                      Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

                                                                      60

                                                                      Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

                                                                      Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

                                                                      61

                                                                      Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

                                                                      62

                                                                      Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

                                                                      Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

                                                                      63

                                                                      Tools ndash Fiddler2

                                                                      Best Practices

                                                                      Picking the Right VM Size

                                                                      bull Having the correct VM size can make a big difference in costs

                                                                      bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                                                                      bull If you scale better than linear across cores larger VMs could save you money

                                                                      bull Pretty rare to see linear scaling across 8 cores

                                                                      bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                                                                      bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                                                                      Using Your VM to the Maximum

                                                                      Remember bull 1 role instance == 1 VM running Windows

                                                                      bull 1 role instance = one specific task for your code

                                                                      bull Yoursquore paying for the entire VM so why not use it

                                                                      bull Common mistake ndash split up code into multiple roles each not using up CPU

                                                                      bull Balance between using up CPU vs having free capacity in times of need

                                                                      bull Multiple ways to use your CPU to the fullest

                                                                      Exploiting Concurrency

                                                                      bull Spin up additional processes each with a specific task or as a unit of concurrency

                                                                      bull May not be ideal if number of active processes exceeds number of cores

                                                                      bull Use multithreading aggressively

                                                                      bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                                                                      bull In NET 4 use the Task Parallel Library

                                                                      bull Data parallelism

                                                                      bull Task parallelism

                                                                      Finding Good Code Neighbors

                                                                      bull Typically code falls into one or more of these categories

                                                                      bull Find code that is intensive with different resources to live together

                                                                      bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                                                                      Memory Intensive

                                                                      CPU Intensive

                                                                      Network IO Intensive

                                                                      Storage IO Intensive

                                                                      Scaling Appropriately

                                                                      bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                                                                      bull Spinning VMs up and down automatically is good at large scale

                                                                      bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                                                                      bull Being too aggressive in spinning down VMs can result in poor user experience

                                                                      bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                                                                      Performance Cost

                                                                      Storage Costs

                                                                      bull Understand an applicationrsquos storage profile and how storage billing works

                                                                      bull Make service choices based on your app profile

                                                                      bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                                                                      bull Service choice can make a big cost difference based on your app profile

                                                                      bull Caching and compressing They help a lot with storage costs

                                                                      Saving Bandwidth Costs

                                                                      Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                                                      Sending fewer things over the wire often means getting fewer things from storage

                                                                      Saving bandwidth costs often lead to savings in other places

                                                                      Sending fewer things means your VM has time to do other tasks

                                                                      All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                                                      Compressing Content

                                                                      1 Gzip all output content

                                                                      bull All modern browsers can decompress on the fly

                                                                      bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                                                      2Tradeoff compute costs for storage size

                                                                      3Minimize image sizes

                                                                      bull Use Portable Network Graphics (PNGs)

                                                                      bull Crush your PNGs

                                                                      bull Strip needless metadata

                                                                      bull Make all PNGs palette PNGs

                                                                      Uncompressed

                                                                      Content

                                                                      Compressed

                                                                      Content

                                                                      Gzip

                                                                      Minify JavaScript

                                                                      Minify CCS

                                                                      Minify Images

                                                                      Best Practices Summary

                                                                      Doing lsquolessrsquo is the key to saving costs

                                                                      Measure everything

                                                                      Know your application profile in and out

                                                                      Research Examples in the Cloud

                                                                      hellipon another set of slides

                                                                      Map Reduce on Azure

                                                                      bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                                                      bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                                                      76

                                                                      Project Daytona - Map Reduce on Azure

                                                                      httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                                                      77

                                                                      Questions and Discussionhellip

                                                                      Thank you for hosting me at the Summer School

                                                                      BLAST (Basic Local Alignment Search Tool)

                                                                      bull The most important software in bioinformatics

                                                                      bull Identify similarity between bio-sequences

                                                                      Computationally intensive

                                                                      bull Large number of pairwise alignment operations

                                                                      bull A BLAST running can take 700 ~ 1000 CPU hours

                                                                      bull Sequence databases growing exponentially

                                                                      bull GenBank doubled in size in about 15 months

                                                                      It is easy to parallelize BLAST

                                                                      bull Segment the input

                                                                      bull Segment processing (querying) is pleasingly parallel

                                                                      bull Segment the database (eg mpiBLAST)

                                                                      bull Needs special result reduction processing

                                                                      Large volume data

                                                                      bull A normal Blast database can be as large as 10GB

                                                                      bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                                      bull The output of BLAST is usually 10-100x larger than the input

                                                                      bull Parallel BLAST engine on Azure

                                                                      bull Query-segmentation data-parallel pattern bull split the input sequences

                                                                      bull query partitions in parallel

                                                                      bull merge results together when done

                                                                      bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                                      bull With three special considerations bull Batch job management

                                                                      bull Task parallelism on an elastic Cloud

                                                                      Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                                      on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                                      A simple SplitJoin pattern

                                                                      Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                                      bull 1248 for small middle large and extra large instance size

                                                                      Task granularity bull Large partition load imbalance

                                                                      bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                                      bull Data transferring overhead

                                                                      Best Practice test runs to profiling and set size to mitigate the overhead

                                                                      Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                                      bull too small repeated computation

                                                                      bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                                      bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                                      bull Watch out for the 2-hour maximum limitation

                                                                      BLAST task

                                                                      Splitting task

                                                                      BLAST task

                                                                      BLAST task

                                                                      BLAST task

                                                                      hellip

                                                                      Merging Task

                                                                      Task size vs Performance

                                                                      bull Benefit of the warm cache effect

                                                                      bull 100 sequences per partition is the best choice

                                                                      Instance size vs Performance

                                                                      bull Super-linear speedup with larger size worker instances

                                                                      bull Primarily due to the memory capability

                                                                      Task SizeInstance Size vs Cost

                                                                      bull Extra-large instance generated the best and the most economical throughput

                                                                      bull Fully utilize the resource

                                                                      Web

                                                                      Portal

                                                                      Web

                                                                      Service

                                                                      Job registration

                                                                      Job Scheduler

                                                                      Worker

                                                                      Worker

                                                                      Worker

                                                                      Global

                                                                      dispatch

                                                                      queue

                                                                      Web Role

                                                                      Azure Table

                                                                      Job Management Role

                                                                      Azure Blob

                                                                      Database

                                                                      updating Role

                                                                      hellip

                                                                      Scaling Engine

                                                                      Blast databases

                                                                      temporary data etc)

                                                                      Job Registry NCBI databases

                                                                      BLAST task

                                                                      Splitting task

                                                                      BLAST task

                                                                      BLAST task

                                                                      BLAST task

                                                                      hellip

                                                                      Merging Task

                                                                      ASPNET program hosted by a web role instance bull Submit jobs

                                                                      bull Track jobrsquos status and logs

                                                                      AuthenticationAuthorization based on Live ID

                                                                      The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                                      Web Portal

                                                                      Web Service

                                                                      Job registration

                                                                      Job Scheduler

                                                                      Job Portal

                                                                      Scaling Engine

                                                                      Job Registry

                                                                      R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                                      Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                                      bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                                      AzureBLAST significantly saved computing timehellip

                                                                      Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                                      ldquoAll against Allrdquo query bull The database is also the input query

                                                                      bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                                      bull Theoretically 100 billion sequence comparisons

                                                                      Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                                      bull Would require 3216731 minutes (61 years) on one desktop

                                                                      One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                                      bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                      bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                      bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                      bull Each segment consists of smaller partitions

                                                                      bull When load imbalances redistribute the load manually

                                                                      5

                                                                      0

                                                                      62 6

                                                                      2 6

                                                                      2 6

                                                                      2 6

                                                                      2 5

                                                                      0 62

                                                                      bull Total size of the output result is ~230GB

                                                                      bull The number of total hits is 1764579487

                                                                      bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                      bull But based our estimates real working instance time should be 6~8 day

                                                                      bull Look into log data to analyze what took placehellip

                                                                      5

                                                                      0

                                                                      62 6

                                                                      2 6

                                                                      2 6

                                                                      2 6

                                                                      2 5

                                                                      0 62

                                                                      A normal log record should be

                                                                      Otherwise something is wrong (eg task failed to complete)

                                                                      3312010 614 RD00155D3611B0 Executing the task 251523

                                                                      3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                      3312010 625 RD00155D3611B0 Executing the task 251553

                                                                      3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                      3312010 644 RD00155D3611B0 Executing the task 251600

                                                                      3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                      3312010 822 RD00155D3611B0 Executing the task 251774

                                                                      3312010 950 RD00155D3611B0 Executing the task 251895

                                                                      3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                      North Europe Data Center totally 34256 tasks processed

                                                                      All 62 compute nodes lost tasks

                                                                      and then came back in a group

                                                                      This is an Update domain

                                                                      ~30 mins

                                                                      ~ 6 nodes in one group

                                                                      35 Nodes experience blob

                                                                      writing failure at same

                                                                      time

                                                                      West Europe Datacenter 30976 tasks are completed and job was killed

                                                                      A reasonable guess the

                                                                      Fault Domain is working

                                                                      MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                      You never miss the water till the well has run dry Irish Proverb

                                                                      ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                      Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                      λv = Latent heat of vaporization (Jg)

                                                                      Rn = Net radiation (W m-2)

                                                                      cp = Specific heat capacity of air (J kg-1 K-1)

                                                                      ρa = dry air density (kg m-3)

                                                                      δq = vapor pressure deficit (Pa)

                                                                      ga = Conductivity of air (inverse of ra) (m s-1)

                                                                      gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                      γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                      Estimating resistanceconductivity across a

                                                                      catchment can be tricky

                                                                      bull Lots of inputs big data reduction

                                                                      bull Some of the inputs are not so simple

                                                                      119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                      (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                      Penman-Monteith (1964)

                                                                      Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                      NASA MODIS

                                                                      imagery source

                                                                      archives

                                                                      5 TB (600K files)

                                                                      FLUXNET curated

                                                                      sensor dataset

                                                                      (30GB 960 files)

                                                                      FLUXNET curated

                                                                      field dataset

                                                                      2 KB (1 file)

                                                                      NCEPNCAR

                                                                      ~100MB

                                                                      (4K files)

                                                                      Vegetative

                                                                      clumping

                                                                      ~5MB (1file)

                                                                      Climate

                                                                      classification

                                                                      ~1MB (1file)

                                                                      20 US year = 1 global year

                                                                      Data collection (map) stage

                                                                      bull Downloads requested input tiles

                                                                      from NASA ftp sites

                                                                      bull Includes geospatial lookup for

                                                                      non-sinusoidal tiles that will

                                                                      contribute to a reprojected

                                                                      sinusoidal tile

                                                                      Reprojection (map) stage

                                                                      bull Converts source tile(s) to

                                                                      intermediate result sinusoidal tiles

                                                                      bull Simple nearest neighbor or spline

                                                                      algorithms

                                                                      Derivation reduction stage

                                                                      bull First stage visible to scientist

                                                                      bull Computes ET in our initial use

                                                                      Analysis reduction stage

                                                                      bull Optional second stage visible to

                                                                      scientist

                                                                      bull Enables production of science

                                                                      analysis artifacts such as maps

                                                                      tables virtual sensors

                                                                      Reduction 1

                                                                      Queue

                                                                      Source

                                                                      Metadata

                                                                      AzureMODIS

                                                                      Service Web Role Portal

                                                                      Request

                                                                      Queue

                                                                      Scientific

                                                                      Results

                                                                      Download

                                                                      Data Collection Stage

                                                                      Source Imagery Download Sites

                                                                      Reprojection

                                                                      Queue

                                                                      Reduction 2

                                                                      Queue

                                                                      Download

                                                                      Queue

                                                                      Scientists

                                                                      Science results

                                                                      Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                      httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                      bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                      bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                      bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                      recoverable units of work

                                                                      bull Execution status of all jobs and tasks persisted in Tables

                                                                      ltPipelineStagegt

                                                                      Request

                                                                      hellip ltPipelineStagegtJobStatus

                                                                      Persist ltPipelineStagegtJob Queue

                                                                      MODISAzure Service

                                                                      (Web Role)

                                                                      Service Monitor

                                                                      (Worker Role)

                                                                      Parse amp Persist ltPipelineStagegtTaskStatus

                                                                      hellip

                                                                      Dispatch

                                                                      ltPipelineStagegtTask Queue

                                                                      All work actually done by a Worker Role

                                                                      bull Sandboxes science or other executable

                                                                      bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                      Service Monitor

                                                                      (Worker Role)

                                                                      Parse amp Persist ltPipelineStagegtTaskStatus

                                                                      GenericWorker

                                                                      (Worker Role)

                                                                      hellip

                                                                      hellip

                                                                      Dispatch

                                                                      ltPipelineStagegtTask Queue

                                                                      hellip

                                                                      ltInputgtData Storage

                                                                      bull Dequeues tasks created by the Service Monitor

                                                                      bull Retries failed tasks 3 times

                                                                      bull Maintains all task status

                                                                      Reprojection Request

                                                                      hellip

                                                                      Service Monitor

                                                                      (Worker Role)

                                                                      ReprojectionJobStatus Persist

                                                                      Parse amp Persist ReprojectionTaskStatus

                                                                      GenericWorker

                                                                      (Worker Role)

                                                                      hellip

                                                                      Job Queue

                                                                      hellip

                                                                      Dispatch

                                                                      Task Queue

                                                                      Points to

                                                                      hellip

                                                                      ScanTimeList

                                                                      SwathGranuleMeta

                                                                      Reprojection Data

                                                                      Storage

                                                                      Each entity specifies a

                                                                      single reprojection job

                                                                      request

                                                                      Each entity specifies a

                                                                      single reprojection task (ie

                                                                      a single tile)

                                                                      Query this table to get

                                                                      geo-metadata (eg

                                                                      boundaries) for each swath

                                                                      tile

                                                                      Query this table to get the

                                                                      list of satellite scan times

                                                                      that cover a target tile

                                                                      Swath Source

                                                                      Data Storage

                                                                      bull Computational costs driven by data scale and need to run reduction multiple times

                                                                      bull Storage costs driven by data scale and 6 month project duration

                                                                      bull Small with respect to the people costs even at graduate student rates

                                                                      Reduction 1 Queue

                                                                      Source Metadata

                                                                      Request Queue

                                                                      Scientific Results Download

                                                                      Data Collection Stage

                                                                      Source Imagery Download Sites

                                                                      Reprojection Queue

                                                                      Reduction 2 Queue

                                                                      Download

                                                                      Queue

                                                                      Scientists

                                                                      Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                      400-500 GB

                                                                      60K files

                                                                      10 MBsec

                                                                      11 hours

                                                                      lt10 workers

                                                                      $50 upload

                                                                      $450 storage

                                                                      400 GB

                                                                      45K files

                                                                      3500 hours

                                                                      20-100

                                                                      workers

                                                                      5-7 GB

                                                                      55K files

                                                                      1800 hours

                                                                      20-100

                                                                      workers

                                                                      lt10 GB

                                                                      ~1K files

                                                                      1800 hours

                                                                      20-100

                                                                      workers

                                                                      $420 cpu

                                                                      $60 download

                                                                      $216 cpu

                                                                      $1 download

                                                                      $6 storage

                                                                      $216 cpu

                                                                      $2 download

                                                                      $9 storage

                                                                      AzureMODIS

                                                                      Service Web Role Portal

                                                                      Total $1420

                                                                      bull Clouds are the largest scale computer centers ever constructed and have the

                                                                      potential to be important to both large and small scale science problems

                                                                      bull Equally import they can increase participation in research providing needed

                                                                      resources to userscommunities without ready access

                                                                      bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                      support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                      latency applications do not perform optimally on clouds today

                                                                      bull Provide valuable fault tolerance and scalability abstractions

                                                                      bull Clouds as amplifier for familiar client tools and on premise compute

                                                                      bull Clouds services to support research provide considerable leverage for both

                                                                      individual researchers and entire communities of researchers

                                                                      • Day 2 - Azure as a PaaS
                                                                      • Day 2 - Applications

                                                                        PartitionKey (Category)

                                                                        RowKey (Title)

                                                                        Timestamp ReleaseDate

                                                                        Action Fast amp Furious hellip 2009

                                                                        Action The Bourne Ultimatum hellip 2007

                                                                        hellip hellip hellip hellip

                                                                        Animation Open Season 2 hellip 2009

                                                                        Animation The Ant Bully hellip 2006

                                                                        PartitionKey (Category)

                                                                        RowKey (Title)

                                                                        Timestamp ReleaseDate

                                                                        Comedy Office Space hellip 1999

                                                                        hellip hellip hellip hellip

                                                                        SciFi X-Men Origins Wolverine hellip 2009

                                                                        hellip hellip hellip hellip

                                                                        War Defiance hellip 2008

                                                                        PartitionKey (Category)

                                                                        RowKey (Title)

                                                                        Timestamp ReleaseDate

                                                                        Action Fast amp Furious hellip 2009

                                                                        Action The Bourne Ultimatum hellip 2007

                                                                        hellip hellip hellip hellip

                                                                        Animation Open Season 2 hellip 2009

                                                                        Animation The Ant Bully hellip 2006

                                                                        hellip hellip hellip hellip

                                                                        Comedy Office Space hellip 1999

                                                                        hellip hellip hellip hellip

                                                                        SciFi X-Men Origins Wolverine hellip 2009

                                                                        hellip hellip hellip hellip

                                                                        War Defiance hellip 2008

                                                                        Partitions and Partition Ranges

                                                                        Key Selection Things to Consider

                                                                        bullDistribute load as much as possible bullHot partitions can be load balanced bullPartitionKey is critical for scalability

                                                                        See httpwwwmicrosoftpdccom2009SVC09 and httpazurescopecloudappnet for more information

                                                                        bull Avoid frequent large scans bull Parallelize queries bull Point queries are most efficient

                                                                        bullTransactions across a single partition bullTransaction semantics amp Reduce round trips

                                                                        Scalability

                                                                        Query Efficiency amp Speed

                                                                        Entity group transactions

                                                                        Expect Continuation Tokens ndash Seriously

                                                                        Maximum of 1000 rows in a response

                                                                        At the end of partition range boundary

                                                                        Maximum of 1000 rows in a response

                                                                        At the end of partition range boundary

                                                                        Maximum of 5 seconds to execute the query

                                                                        Tables Recap bullEfficient for frequently used queries

                                                                        bullSupports batch transactions

                                                                        bullDistributes load

                                                                        Select PartitionKey and RowKey that help scale

                                                                        Avoid ldquoAppend onlyrdquo patterns

                                                                        Always Handle continuation tokens

                                                                        ldquoORrdquo predicates are not optimized

                                                                        Implement back-off strategy for retries

                                                                        bullDistribute by using a hash etc as prefix

                                                                        bullExpect continuation tokens for range queries

                                                                        bullExecute the queries that form the ldquoORrdquo predicates as separate queries

                                                                        bullServer busy

                                                                        bullLoad balance partitions to meet traffic needs

                                                                        bullLoad on single partition has exceeded the limits

                                                                        WCF Data Services

                                                                        bullUse a new context for each logical operation

                                                                        bullAddObjectAttachTo can throw exception if entity is already being tracked

                                                                        bullPoint query throws an exception if resource does not exist Use IgnoreResourceNotFoundException

                                                                        Queues Their Unique Role in Building Reliable Scalable Applications

                                                                        bull Want roles that work closely together but are not bound together bull Tight coupling leads to brittleness

                                                                        bull This can aid in scaling and performance

                                                                        bull A queue can hold an unlimited number of messages bull Messages must be serializable as XML

                                                                        bull Limited to 8KB in size

                                                                        bull Commonly use the work ticket pattern

                                                                        bull Why not simply use a table

                                                                        Queue Terminology

                                                                        Message Lifecycle

                                                                        Queue

                                                                        Msg 1

                                                                        Msg 2

                                                                        Msg 3

                                                                        Msg 4

                                                                        Worker Role

                                                                        Worker Role

                                                                        PutMessage

                                                                        Web Role

                                                                        GetMessage (Timeout) RemoveMessage

                                                                        Msg 2 Msg 1

                                                                        Worker Role

                                                                        Msg 2

                                                                        POST httpmyaccountqueuecorewindowsnetmyqueuemessages

                                                                        HTTP11 200 OK Transfer-Encoding chunked Content-Type applicationxml Date Tue 09 Dec 2008 210430 GMT Server Nephos Queue Service Version 10 Microsoft-HTTPAPI20

                                                                        ltxml version=10 encoding=utf-8gt ltQueueMessagesListgt ltQueueMessagegt ltMessageIdgt5974b586-0df3-4e2d-ad0c-18e3892bfca2ltMessageIdgt ltInsertionTimegtMon 22 Sep 2008 232920 GMTltInsertionTimegt ltExpirationTimegtMon 29 Sep 2008 232920 GMTltExpirationTimegt ltPopReceiptgtYzQ4Yzg1MDIGM0MDFiZDAwYzEwltPopReceiptgt ltTimeNextVisiblegtTue 23 Sep 2008 052920GMTltTimeNextVisiblegt ltMessageTextgtPHRlc3Q+dGdGVzdD4=ltMessageTextgt ltQueueMessagegt ltQueueMessagesListgt

                                                                        DELETE httpmyaccountqueuecorewindowsnetmyqueuemessagesmessageidpopreceipt=YzQ4Yzg1MDIGM0MDFiZDAwYzEw

                                                                        Truncated Exponential Back Off Polling

                                                                        Consider a backoff polling approach Each empty poll

                                                                        increases interval by 2x

                                                                        A successful sets the interval back to 1

                                                                        44

                                                                        2 1

                                                                        1 1

                                                                        C1

                                                                        C2

                                                                        Removing Poison Messages

                                                                        1 1

                                                                        2 1

                                                                        3 4 0

                                                                        Producers Consumers

                                                                        P2

                                                                        P1

                                                                        3 0

                                                                        2 GetMessage(Q 30 s) msg 2

                                                                        1 GetMessage(Q 30 s) msg 1

                                                                        1 1

                                                                        2 1

                                                                        1 0

                                                                        2 0

                                                                        45

                                                                        C1

                                                                        C2

                                                                        Removing Poison Messages

                                                                        3 4 0

                                                                        Producers Consumers

                                                                        P2

                                                                        P1

                                                                        1 1

                                                                        2 1

                                                                        2 GetMessage(Q 30 s) msg 2 3 C2 consumed msg 2 4 DeleteMessage(Q msg 2) 7 GetMessage(Q 30 s) msg 1

                                                                        1 GetMessage(Q 30 s) msg 1 5 C1 crashed

                                                                        1 1

                                                                        2 1

                                                                        6 msg1 visible 30 s after Dequeue 3 0

                                                                        1 2

                                                                        1 1

                                                                        1 2

                                                                        46

                                                                        C1

                                                                        C2

                                                                        Removing Poison Messages

                                                                        3 4 0

                                                                        Producers Consumers

                                                                        P2

                                                                        P1

                                                                        1 2

                                                                        2 Dequeue(Q 30 sec) msg 2 3 C2 consumed msg 2 4 Delete(Q msg 2) 7 Dequeue(Q 30 sec) msg 1 8 C2 crashed

                                                                        1 Dequeue(Q 30 sec) msg 1 5 C1 crashed 10 C1 restarted 11 Dequeue(Q 30 sec) msg 1 12 DequeueCount gt 2 13 Delete (Q msg1) 1

                                                                        2

                                                                        6 msg1 visible 30s after Dequeue 9 msg1 visible 30s after Dequeue

                                                                        3 0

                                                                        1 3

                                                                        1 2

                                                                        1 3

                                                                        Queues Recap

                                                                        bullNo need to deal with failures Make message

                                                                        processing idempotent

                                                                        bullInvisible messages result in out of order Do not rely on order

                                                                        bullEnforce threshold on messagersquos dequeue count Use Dequeue count to remove

                                                                        poison messages

                                                                        bullMessages gt 8KB

                                                                        bullBatch messages

                                                                        bullGarbage collect orphaned blobs

                                                                        bullDynamically increasereduce workers

                                                                        Use blob to store message data with

                                                                        reference in message

                                                                        Use message count to scale

                                                                        bullNo need to deal with failures

                                                                        bullInvisible messages result in out of order

                                                                        bullEnforce threshold on messagersquos dequeue count

                                                                        bullDynamically increasereduce workers

                                                                        Windows Azure Storage Takeaways

                                                                        Blobs

                                                                        Drives

                                                                        Tables

                                                                        Queues

                                                                        httpblogsmsdncomwindowsazurestorage

                                                                        httpazurescopecloudappnet

                                                                        49

                                                                        A Quick Exercise

                                                                        hellipThen letrsquos look at some code and some tools

                                                                        50

                                                                        Code ndash AccountInformationcs public class AccountInformation private static string storageKey = ldquotHiSiSnOtMyKeY private static string accountName = jjstore private static StorageCredentialsAccountAndKey credentials internal static StorageCredentialsAccountAndKey Credentials get if (credentials == null) credentials = new StorageCredentialsAccountAndKey(accountName storageKey) return credentials

                                                                        51

                                                                        Code ndash BlobHelpercs public class BlobHelper private static string defaultContainerName = school private CloudBlobClient client = null private CloudBlobContainer container = null private void InitContainer() if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudBlobClient() container = clientGetContainerReference(defaultContainerName) containerCreateIfNotExist() BlobContainerPermissions permissions = containerGetPermissions() permissionsPublicAccess = BlobContainerPublicAccessTypeContainer containerSetPermissions(permissions)

                                                                        52

                                                                        Code ndash BlobHelpercs

                                                                        public void WriteFileToBlob(string filePath) if (client == null || container == null) InitContainer() FileInfo file = new FileInfo(filePath) CloudBlob blob = containerGetBlobReference(fileName) blobPropertiesContentType = GetContentType(fileExtension) blobUploadFile(fileFullName) Or if you want to write a string replace the last line with blobUploadText(someString) And make sure you set the content type to the appropriate MIME type (eg ldquotextplainrdquo)

                                                                        53

                                                                        Code ndash BlobHelpercs

                                                                        public string GetBlobText(string blobName) if (client == null || container == null) InitContainer() CloudBlob blob = containerGetBlobReference(blobName) try return blobDownloadText() catch (Exception) The blob probably does not exist or there is no connection available return null

                                                                        54

                                                                        Application Code - Blobs private void SaveToCloudButton_Click(object sender RoutedEventArgs e) StringBuilder buff = new StringBuilder() buffAppendLine(LastNameFirstNameEmailBirthdayNativeLanguageFavoriteIceCreamYearsInPhDGraduated) foreach (AttendeeEntity attendee in attendees) buffAppendLine(attendeeToCsvString()) blobHelperWriteStringToBlob(SummerSchoolAttendeestxt buffToString())

                                                                        The blob is now available at httpltAccountNamegtblobcorewindowsnetltContainerNamegtltBlobNamegt Or in this case httpjjstoreblobcorewindowsnetschoolSummerSchoolAttendeestxt

                                                                        55

                                                                        Code - TableEntities using MicrosoftWindowsAzureStorageClient public class AttendeeEntity TableServiceEntity public string FirstName get set public string LastName get set public string Email get set public DateTime Birthday get set public string FavoriteIceCream get set public int YearsInPhD get set public bool Graduated get set hellip

                                                                        56

                                                                        Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

                                                                        57

                                                                        Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

                                                                        58

                                                                        Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

                                                                        59

                                                                        Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

                                                                        60

                                                                        Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

                                                                        Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

                                                                        61

                                                                        Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

                                                                        62

                                                                        Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

                                                                        Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

                                                                        63

                                                                        Tools ndash Fiddler2

                                                                        Best Practices

                                                                        Picking the Right VM Size

                                                                        bull Having the correct VM size can make a big difference in costs

                                                                        bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                                                                        bull If you scale better than linear across cores larger VMs could save you money

                                                                        bull Pretty rare to see linear scaling across 8 cores

                                                                        bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                                                                        bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                                                                        Using Your VM to the Maximum

                                                                        Remember bull 1 role instance == 1 VM running Windows

                                                                        bull 1 role instance = one specific task for your code

                                                                        bull Yoursquore paying for the entire VM so why not use it

                                                                        bull Common mistake ndash split up code into multiple roles each not using up CPU

                                                                        bull Balance between using up CPU vs having free capacity in times of need

                                                                        bull Multiple ways to use your CPU to the fullest

                                                                        Exploiting Concurrency

                                                                        bull Spin up additional processes each with a specific task or as a unit of concurrency

                                                                        bull May not be ideal if number of active processes exceeds number of cores

                                                                        bull Use multithreading aggressively

                                                                        bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                                                                        bull In NET 4 use the Task Parallel Library

                                                                        bull Data parallelism

                                                                        bull Task parallelism

                                                                        Finding Good Code Neighbors

                                                                        bull Typically code falls into one or more of these categories

                                                                        bull Find code that is intensive with different resources to live together

                                                                        bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                                                                        Memory Intensive

                                                                        CPU Intensive

                                                                        Network IO Intensive

                                                                        Storage IO Intensive

                                                                        Scaling Appropriately

                                                                        bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                                                                        bull Spinning VMs up and down automatically is good at large scale

                                                                        bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                                                                        bull Being too aggressive in spinning down VMs can result in poor user experience

                                                                        bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                                                                        Performance Cost

                                                                        Storage Costs

                                                                        bull Understand an applicationrsquos storage profile and how storage billing works

                                                                        bull Make service choices based on your app profile

                                                                        bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                                                                        bull Service choice can make a big cost difference based on your app profile

                                                                        bull Caching and compressing They help a lot with storage costs

                                                                        Saving Bandwidth Costs

                                                                        Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                                                        Sending fewer things over the wire often means getting fewer things from storage

                                                                        Saving bandwidth costs often lead to savings in other places

                                                                        Sending fewer things means your VM has time to do other tasks

                                                                        All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                                                        Compressing Content

                                                                        1 Gzip all output content

                                                                        bull All modern browsers can decompress on the fly

                                                                        bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                                                        2Tradeoff compute costs for storage size

                                                                        3Minimize image sizes

                                                                        bull Use Portable Network Graphics (PNGs)

                                                                        bull Crush your PNGs

                                                                        bull Strip needless metadata

                                                                        bull Make all PNGs palette PNGs

                                                                        Uncompressed

                                                                        Content

                                                                        Compressed

                                                                        Content

                                                                        Gzip

                                                                        Minify JavaScript

                                                                        Minify CCS

                                                                        Minify Images

                                                                        Best Practices Summary

                                                                        Doing lsquolessrsquo is the key to saving costs

                                                                        Measure everything

                                                                        Know your application profile in and out

                                                                        Research Examples in the Cloud

                                                                        hellipon another set of slides

                                                                        Map Reduce on Azure

                                                                        bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                                                        bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                                                        76

                                                                        Project Daytona - Map Reduce on Azure

                                                                        httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                                                        77

                                                                        Questions and Discussionhellip

                                                                        Thank you for hosting me at the Summer School

                                                                        BLAST (Basic Local Alignment Search Tool)

                                                                        bull The most important software in bioinformatics

                                                                        bull Identify similarity between bio-sequences

                                                                        Computationally intensive

                                                                        bull Large number of pairwise alignment operations

                                                                        bull A BLAST running can take 700 ~ 1000 CPU hours

                                                                        bull Sequence databases growing exponentially

                                                                        bull GenBank doubled in size in about 15 months

                                                                        It is easy to parallelize BLAST

                                                                        bull Segment the input

                                                                        bull Segment processing (querying) is pleasingly parallel

                                                                        bull Segment the database (eg mpiBLAST)

                                                                        bull Needs special result reduction processing

                                                                        Large volume data

                                                                        bull A normal Blast database can be as large as 10GB

                                                                        bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                                        bull The output of BLAST is usually 10-100x larger than the input

                                                                        bull Parallel BLAST engine on Azure

                                                                        bull Query-segmentation data-parallel pattern bull split the input sequences

                                                                        bull query partitions in parallel

                                                                        bull merge results together when done

                                                                        bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                                        bull With three special considerations bull Batch job management

                                                                        bull Task parallelism on an elastic Cloud

                                                                        Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                                        on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                                        A simple SplitJoin pattern

                                                                        Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                                        bull 1248 for small middle large and extra large instance size

                                                                        Task granularity bull Large partition load imbalance

                                                                        bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                                        bull Data transferring overhead

                                                                        Best Practice test runs to profiling and set size to mitigate the overhead

                                                                        Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                                        bull too small repeated computation

                                                                        bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                                        bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                                        bull Watch out for the 2-hour maximum limitation

                                                                        BLAST task

                                                                        Splitting task

                                                                        BLAST task

                                                                        BLAST task

                                                                        BLAST task

                                                                        hellip

                                                                        Merging Task

                                                                        Task size vs Performance

                                                                        bull Benefit of the warm cache effect

                                                                        bull 100 sequences per partition is the best choice

                                                                        Instance size vs Performance

                                                                        bull Super-linear speedup with larger size worker instances

                                                                        bull Primarily due to the memory capability

                                                                        Task SizeInstance Size vs Cost

                                                                        bull Extra-large instance generated the best and the most economical throughput

                                                                        bull Fully utilize the resource

                                                                        Web

                                                                        Portal

                                                                        Web

                                                                        Service

                                                                        Job registration

                                                                        Job Scheduler

                                                                        Worker

                                                                        Worker

                                                                        Worker

                                                                        Global

                                                                        dispatch

                                                                        queue

                                                                        Web Role

                                                                        Azure Table

                                                                        Job Management Role

                                                                        Azure Blob

                                                                        Database

                                                                        updating Role

                                                                        hellip

                                                                        Scaling Engine

                                                                        Blast databases

                                                                        temporary data etc)

                                                                        Job Registry NCBI databases

                                                                        BLAST task

                                                                        Splitting task

                                                                        BLAST task

                                                                        BLAST task

                                                                        BLAST task

                                                                        hellip

                                                                        Merging Task

                                                                        ASPNET program hosted by a web role instance bull Submit jobs

                                                                        bull Track jobrsquos status and logs

                                                                        AuthenticationAuthorization based on Live ID

                                                                        The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                                        Web Portal

                                                                        Web Service

                                                                        Job registration

                                                                        Job Scheduler

                                                                        Job Portal

                                                                        Scaling Engine

                                                                        Job Registry

                                                                        R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                                        Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                                        bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                                        AzureBLAST significantly saved computing timehellip

                                                                        Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                                        ldquoAll against Allrdquo query bull The database is also the input query

                                                                        bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                                        bull Theoretically 100 billion sequence comparisons

                                                                        Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                                        bull Would require 3216731 minutes (61 years) on one desktop

                                                                        One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                                        bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                        bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                        bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                        bull Each segment consists of smaller partitions

                                                                        bull When load imbalances redistribute the load manually

                                                                        5

                                                                        0

                                                                        62 6

                                                                        2 6

                                                                        2 6

                                                                        2 6

                                                                        2 5

                                                                        0 62

                                                                        bull Total size of the output result is ~230GB

                                                                        bull The number of total hits is 1764579487

                                                                        bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                        bull But based our estimates real working instance time should be 6~8 day

                                                                        bull Look into log data to analyze what took placehellip

                                                                        5

                                                                        0

                                                                        62 6

                                                                        2 6

                                                                        2 6

                                                                        2 6

                                                                        2 5

                                                                        0 62

                                                                        A normal log record should be

                                                                        Otherwise something is wrong (eg task failed to complete)

                                                                        3312010 614 RD00155D3611B0 Executing the task 251523

                                                                        3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                        3312010 625 RD00155D3611B0 Executing the task 251553

                                                                        3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                        3312010 644 RD00155D3611B0 Executing the task 251600

                                                                        3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                        3312010 822 RD00155D3611B0 Executing the task 251774

                                                                        3312010 950 RD00155D3611B0 Executing the task 251895

                                                                        3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                        North Europe Data Center totally 34256 tasks processed

                                                                        All 62 compute nodes lost tasks

                                                                        and then came back in a group

                                                                        This is an Update domain

                                                                        ~30 mins

                                                                        ~ 6 nodes in one group

                                                                        35 Nodes experience blob

                                                                        writing failure at same

                                                                        time

                                                                        West Europe Datacenter 30976 tasks are completed and job was killed

                                                                        A reasonable guess the

                                                                        Fault Domain is working

                                                                        MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                        You never miss the water till the well has run dry Irish Proverb

                                                                        ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                        Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                        λv = Latent heat of vaporization (Jg)

                                                                        Rn = Net radiation (W m-2)

                                                                        cp = Specific heat capacity of air (J kg-1 K-1)

                                                                        ρa = dry air density (kg m-3)

                                                                        δq = vapor pressure deficit (Pa)

                                                                        ga = Conductivity of air (inverse of ra) (m s-1)

                                                                        gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                        γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                        Estimating resistanceconductivity across a

                                                                        catchment can be tricky

                                                                        bull Lots of inputs big data reduction

                                                                        bull Some of the inputs are not so simple

                                                                        119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                        (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                        Penman-Monteith (1964)

                                                                        Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                        NASA MODIS

                                                                        imagery source

                                                                        archives

                                                                        5 TB (600K files)

                                                                        FLUXNET curated

                                                                        sensor dataset

                                                                        (30GB 960 files)

                                                                        FLUXNET curated

                                                                        field dataset

                                                                        2 KB (1 file)

                                                                        NCEPNCAR

                                                                        ~100MB

                                                                        (4K files)

                                                                        Vegetative

                                                                        clumping

                                                                        ~5MB (1file)

                                                                        Climate

                                                                        classification

                                                                        ~1MB (1file)

                                                                        20 US year = 1 global year

                                                                        Data collection (map) stage

                                                                        bull Downloads requested input tiles

                                                                        from NASA ftp sites

                                                                        bull Includes geospatial lookup for

                                                                        non-sinusoidal tiles that will

                                                                        contribute to a reprojected

                                                                        sinusoidal tile

                                                                        Reprojection (map) stage

                                                                        bull Converts source tile(s) to

                                                                        intermediate result sinusoidal tiles

                                                                        bull Simple nearest neighbor or spline

                                                                        algorithms

                                                                        Derivation reduction stage

                                                                        bull First stage visible to scientist

                                                                        bull Computes ET in our initial use

                                                                        Analysis reduction stage

                                                                        bull Optional second stage visible to

                                                                        scientist

                                                                        bull Enables production of science

                                                                        analysis artifacts such as maps

                                                                        tables virtual sensors

                                                                        Reduction 1

                                                                        Queue

                                                                        Source

                                                                        Metadata

                                                                        AzureMODIS

                                                                        Service Web Role Portal

                                                                        Request

                                                                        Queue

                                                                        Scientific

                                                                        Results

                                                                        Download

                                                                        Data Collection Stage

                                                                        Source Imagery Download Sites

                                                                        Reprojection

                                                                        Queue

                                                                        Reduction 2

                                                                        Queue

                                                                        Download

                                                                        Queue

                                                                        Scientists

                                                                        Science results

                                                                        Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                        httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                        bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                        bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                        bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                        recoverable units of work

                                                                        bull Execution status of all jobs and tasks persisted in Tables

                                                                        ltPipelineStagegt

                                                                        Request

                                                                        hellip ltPipelineStagegtJobStatus

                                                                        Persist ltPipelineStagegtJob Queue

                                                                        MODISAzure Service

                                                                        (Web Role)

                                                                        Service Monitor

                                                                        (Worker Role)

                                                                        Parse amp Persist ltPipelineStagegtTaskStatus

                                                                        hellip

                                                                        Dispatch

                                                                        ltPipelineStagegtTask Queue

                                                                        All work actually done by a Worker Role

                                                                        bull Sandboxes science or other executable

                                                                        bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                        Service Monitor

                                                                        (Worker Role)

                                                                        Parse amp Persist ltPipelineStagegtTaskStatus

                                                                        GenericWorker

                                                                        (Worker Role)

                                                                        hellip

                                                                        hellip

                                                                        Dispatch

                                                                        ltPipelineStagegtTask Queue

                                                                        hellip

                                                                        ltInputgtData Storage

                                                                        bull Dequeues tasks created by the Service Monitor

                                                                        bull Retries failed tasks 3 times

                                                                        bull Maintains all task status

                                                                        Reprojection Request

                                                                        hellip

                                                                        Service Monitor

                                                                        (Worker Role)

                                                                        ReprojectionJobStatus Persist

                                                                        Parse amp Persist ReprojectionTaskStatus

                                                                        GenericWorker

                                                                        (Worker Role)

                                                                        hellip

                                                                        Job Queue

                                                                        hellip

                                                                        Dispatch

                                                                        Task Queue

                                                                        Points to

                                                                        hellip

                                                                        ScanTimeList

                                                                        SwathGranuleMeta

                                                                        Reprojection Data

                                                                        Storage

                                                                        Each entity specifies a

                                                                        single reprojection job

                                                                        request

                                                                        Each entity specifies a

                                                                        single reprojection task (ie

                                                                        a single tile)

                                                                        Query this table to get

                                                                        geo-metadata (eg

                                                                        boundaries) for each swath

                                                                        tile

                                                                        Query this table to get the

                                                                        list of satellite scan times

                                                                        that cover a target tile

                                                                        Swath Source

                                                                        Data Storage

                                                                        bull Computational costs driven by data scale and need to run reduction multiple times

                                                                        bull Storage costs driven by data scale and 6 month project duration

                                                                        bull Small with respect to the people costs even at graduate student rates

                                                                        Reduction 1 Queue

                                                                        Source Metadata

                                                                        Request Queue

                                                                        Scientific Results Download

                                                                        Data Collection Stage

                                                                        Source Imagery Download Sites

                                                                        Reprojection Queue

                                                                        Reduction 2 Queue

                                                                        Download

                                                                        Queue

                                                                        Scientists

                                                                        Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                        400-500 GB

                                                                        60K files

                                                                        10 MBsec

                                                                        11 hours

                                                                        lt10 workers

                                                                        $50 upload

                                                                        $450 storage

                                                                        400 GB

                                                                        45K files

                                                                        3500 hours

                                                                        20-100

                                                                        workers

                                                                        5-7 GB

                                                                        55K files

                                                                        1800 hours

                                                                        20-100

                                                                        workers

                                                                        lt10 GB

                                                                        ~1K files

                                                                        1800 hours

                                                                        20-100

                                                                        workers

                                                                        $420 cpu

                                                                        $60 download

                                                                        $216 cpu

                                                                        $1 download

                                                                        $6 storage

                                                                        $216 cpu

                                                                        $2 download

                                                                        $9 storage

                                                                        AzureMODIS

                                                                        Service Web Role Portal

                                                                        Total $1420

                                                                        bull Clouds are the largest scale computer centers ever constructed and have the

                                                                        potential to be important to both large and small scale science problems

                                                                        bull Equally import they can increase participation in research providing needed

                                                                        resources to userscommunities without ready access

                                                                        bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                        support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                        latency applications do not perform optimally on clouds today

                                                                        bull Provide valuable fault tolerance and scalability abstractions

                                                                        bull Clouds as amplifier for familiar client tools and on premise compute

                                                                        bull Clouds services to support research provide considerable leverage for both

                                                                        individual researchers and entire communities of researchers

                                                                        • Day 2 - Azure as a PaaS
                                                                        • Day 2 - Applications

                                                                          Key Selection Things to Consider

                                                                          bullDistribute load as much as possible bullHot partitions can be load balanced bullPartitionKey is critical for scalability

                                                                          See httpwwwmicrosoftpdccom2009SVC09 and httpazurescopecloudappnet for more information

                                                                          bull Avoid frequent large scans bull Parallelize queries bull Point queries are most efficient

                                                                          bullTransactions across a single partition bullTransaction semantics amp Reduce round trips

                                                                          Scalability

                                                                          Query Efficiency amp Speed

                                                                          Entity group transactions

                                                                          Expect Continuation Tokens ndash Seriously

                                                                          Maximum of 1000 rows in a response

                                                                          At the end of partition range boundary

                                                                          Maximum of 1000 rows in a response

                                                                          At the end of partition range boundary

                                                                          Maximum of 5 seconds to execute the query

                                                                          Tables Recap bullEfficient for frequently used queries

                                                                          bullSupports batch transactions

                                                                          bullDistributes load

                                                                          Select PartitionKey and RowKey that help scale

                                                                          Avoid ldquoAppend onlyrdquo patterns

                                                                          Always Handle continuation tokens

                                                                          ldquoORrdquo predicates are not optimized

                                                                          Implement back-off strategy for retries

                                                                          bullDistribute by using a hash etc as prefix

                                                                          bullExpect continuation tokens for range queries

                                                                          bullExecute the queries that form the ldquoORrdquo predicates as separate queries

                                                                          bullServer busy

                                                                          bullLoad balance partitions to meet traffic needs

                                                                          bullLoad on single partition has exceeded the limits

                                                                          WCF Data Services

                                                                          bullUse a new context for each logical operation

                                                                          bullAddObjectAttachTo can throw exception if entity is already being tracked

                                                                          bullPoint query throws an exception if resource does not exist Use IgnoreResourceNotFoundException

                                                                          Queues Their Unique Role in Building Reliable Scalable Applications

                                                                          bull Want roles that work closely together but are not bound together bull Tight coupling leads to brittleness

                                                                          bull This can aid in scaling and performance

                                                                          bull A queue can hold an unlimited number of messages bull Messages must be serializable as XML

                                                                          bull Limited to 8KB in size

                                                                          bull Commonly use the work ticket pattern

                                                                          bull Why not simply use a table

                                                                          Queue Terminology

                                                                          Message Lifecycle

                                                                          Queue

                                                                          Msg 1

                                                                          Msg 2

                                                                          Msg 3

                                                                          Msg 4

                                                                          Worker Role

                                                                          Worker Role

                                                                          PutMessage

                                                                          Web Role

                                                                          GetMessage (Timeout) RemoveMessage

                                                                          Msg 2 Msg 1

                                                                          Worker Role

                                                                          Msg 2

                                                                          POST httpmyaccountqueuecorewindowsnetmyqueuemessages

                                                                          HTTP11 200 OK Transfer-Encoding chunked Content-Type applicationxml Date Tue 09 Dec 2008 210430 GMT Server Nephos Queue Service Version 10 Microsoft-HTTPAPI20

                                                                          ltxml version=10 encoding=utf-8gt ltQueueMessagesListgt ltQueueMessagegt ltMessageIdgt5974b586-0df3-4e2d-ad0c-18e3892bfca2ltMessageIdgt ltInsertionTimegtMon 22 Sep 2008 232920 GMTltInsertionTimegt ltExpirationTimegtMon 29 Sep 2008 232920 GMTltExpirationTimegt ltPopReceiptgtYzQ4Yzg1MDIGM0MDFiZDAwYzEwltPopReceiptgt ltTimeNextVisiblegtTue 23 Sep 2008 052920GMTltTimeNextVisiblegt ltMessageTextgtPHRlc3Q+dGdGVzdD4=ltMessageTextgt ltQueueMessagegt ltQueueMessagesListgt

                                                                          DELETE httpmyaccountqueuecorewindowsnetmyqueuemessagesmessageidpopreceipt=YzQ4Yzg1MDIGM0MDFiZDAwYzEw

                                                                          Truncated Exponential Back Off Polling

                                                                          Consider a backoff polling approach Each empty poll

                                                                          increases interval by 2x

                                                                          A successful sets the interval back to 1

                                                                          44

                                                                          2 1

                                                                          1 1

                                                                          C1

                                                                          C2

                                                                          Removing Poison Messages

                                                                          1 1

                                                                          2 1

                                                                          3 4 0

                                                                          Producers Consumers

                                                                          P2

                                                                          P1

                                                                          3 0

                                                                          2 GetMessage(Q 30 s) msg 2

                                                                          1 GetMessage(Q 30 s) msg 1

                                                                          1 1

                                                                          2 1

                                                                          1 0

                                                                          2 0

                                                                          45

                                                                          C1

                                                                          C2

                                                                          Removing Poison Messages

                                                                          3 4 0

                                                                          Producers Consumers

                                                                          P2

                                                                          P1

                                                                          1 1

                                                                          2 1

                                                                          2 GetMessage(Q 30 s) msg 2 3 C2 consumed msg 2 4 DeleteMessage(Q msg 2) 7 GetMessage(Q 30 s) msg 1

                                                                          1 GetMessage(Q 30 s) msg 1 5 C1 crashed

                                                                          1 1

                                                                          2 1

                                                                          6 msg1 visible 30 s after Dequeue 3 0

                                                                          1 2

                                                                          1 1

                                                                          1 2

                                                                          46

                                                                          C1

                                                                          C2

                                                                          Removing Poison Messages

                                                                          3 4 0

                                                                          Producers Consumers

                                                                          P2

                                                                          P1

                                                                          1 2

                                                                          2 Dequeue(Q 30 sec) msg 2 3 C2 consumed msg 2 4 Delete(Q msg 2) 7 Dequeue(Q 30 sec) msg 1 8 C2 crashed

                                                                          1 Dequeue(Q 30 sec) msg 1 5 C1 crashed 10 C1 restarted 11 Dequeue(Q 30 sec) msg 1 12 DequeueCount gt 2 13 Delete (Q msg1) 1

                                                                          2

                                                                          6 msg1 visible 30s after Dequeue 9 msg1 visible 30s after Dequeue

                                                                          3 0

                                                                          1 3

                                                                          1 2

                                                                          1 3

                                                                          Queues Recap

                                                                          bullNo need to deal with failures Make message

                                                                          processing idempotent

                                                                          bullInvisible messages result in out of order Do not rely on order

                                                                          bullEnforce threshold on messagersquos dequeue count Use Dequeue count to remove

                                                                          poison messages

                                                                          bullMessages gt 8KB

                                                                          bullBatch messages

                                                                          bullGarbage collect orphaned blobs

                                                                          bullDynamically increasereduce workers

                                                                          Use blob to store message data with

                                                                          reference in message

                                                                          Use message count to scale

                                                                          bullNo need to deal with failures

                                                                          bullInvisible messages result in out of order

                                                                          bullEnforce threshold on messagersquos dequeue count

                                                                          bullDynamically increasereduce workers

                                                                          Windows Azure Storage Takeaways

                                                                          Blobs

                                                                          Drives

                                                                          Tables

                                                                          Queues

                                                                          httpblogsmsdncomwindowsazurestorage

                                                                          httpazurescopecloudappnet

                                                                          49

                                                                          A Quick Exercise

                                                                          hellipThen letrsquos look at some code and some tools

                                                                          50

                                                                          Code ndash AccountInformationcs public class AccountInformation private static string storageKey = ldquotHiSiSnOtMyKeY private static string accountName = jjstore private static StorageCredentialsAccountAndKey credentials internal static StorageCredentialsAccountAndKey Credentials get if (credentials == null) credentials = new StorageCredentialsAccountAndKey(accountName storageKey) return credentials

                                                                          51

                                                                          Code ndash BlobHelpercs public class BlobHelper private static string defaultContainerName = school private CloudBlobClient client = null private CloudBlobContainer container = null private void InitContainer() if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudBlobClient() container = clientGetContainerReference(defaultContainerName) containerCreateIfNotExist() BlobContainerPermissions permissions = containerGetPermissions() permissionsPublicAccess = BlobContainerPublicAccessTypeContainer containerSetPermissions(permissions)

                                                                          52

                                                                          Code ndash BlobHelpercs

                                                                          public void WriteFileToBlob(string filePath) if (client == null || container == null) InitContainer() FileInfo file = new FileInfo(filePath) CloudBlob blob = containerGetBlobReference(fileName) blobPropertiesContentType = GetContentType(fileExtension) blobUploadFile(fileFullName) Or if you want to write a string replace the last line with blobUploadText(someString) And make sure you set the content type to the appropriate MIME type (eg ldquotextplainrdquo)

                                                                          53

                                                                          Code ndash BlobHelpercs

                                                                          public string GetBlobText(string blobName) if (client == null || container == null) InitContainer() CloudBlob blob = containerGetBlobReference(blobName) try return blobDownloadText() catch (Exception) The blob probably does not exist or there is no connection available return null

                                                                          54

                                                                          Application Code - Blobs private void SaveToCloudButton_Click(object sender RoutedEventArgs e) StringBuilder buff = new StringBuilder() buffAppendLine(LastNameFirstNameEmailBirthdayNativeLanguageFavoriteIceCreamYearsInPhDGraduated) foreach (AttendeeEntity attendee in attendees) buffAppendLine(attendeeToCsvString()) blobHelperWriteStringToBlob(SummerSchoolAttendeestxt buffToString())

                                                                          The blob is now available at httpltAccountNamegtblobcorewindowsnetltContainerNamegtltBlobNamegt Or in this case httpjjstoreblobcorewindowsnetschoolSummerSchoolAttendeestxt

                                                                          55

                                                                          Code - TableEntities using MicrosoftWindowsAzureStorageClient public class AttendeeEntity TableServiceEntity public string FirstName get set public string LastName get set public string Email get set public DateTime Birthday get set public string FavoriteIceCream get set public int YearsInPhD get set public bool Graduated get set hellip

                                                                          56

                                                                          Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

                                                                          57

                                                                          Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

                                                                          58

                                                                          Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

                                                                          59

                                                                          Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

                                                                          60

                                                                          Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

                                                                          Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

                                                                          61

                                                                          Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

                                                                          62

                                                                          Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

                                                                          Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

                                                                          63

                                                                          Tools ndash Fiddler2

                                                                          Best Practices

                                                                          Picking the Right VM Size

                                                                          bull Having the correct VM size can make a big difference in costs

                                                                          bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                                                                          bull If you scale better than linear across cores larger VMs could save you money

                                                                          bull Pretty rare to see linear scaling across 8 cores

                                                                          bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                                                                          bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                                                                          Using Your VM to the Maximum

                                                                          Remember bull 1 role instance == 1 VM running Windows

                                                                          bull 1 role instance = one specific task for your code

                                                                          bull Yoursquore paying for the entire VM so why not use it

                                                                          bull Common mistake ndash split up code into multiple roles each not using up CPU

                                                                          bull Balance between using up CPU vs having free capacity in times of need

                                                                          bull Multiple ways to use your CPU to the fullest

                                                                          Exploiting Concurrency

                                                                          bull Spin up additional processes each with a specific task or as a unit of concurrency

                                                                          bull May not be ideal if number of active processes exceeds number of cores

                                                                          bull Use multithreading aggressively

                                                                          bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                                                                          bull In NET 4 use the Task Parallel Library

                                                                          bull Data parallelism

                                                                          bull Task parallelism

                                                                          Finding Good Code Neighbors

                                                                          bull Typically code falls into one or more of these categories

                                                                          bull Find code that is intensive with different resources to live together

                                                                          bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                                                                          Memory Intensive

                                                                          CPU Intensive

                                                                          Network IO Intensive

                                                                          Storage IO Intensive

                                                                          Scaling Appropriately

                                                                          bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                                                                          bull Spinning VMs up and down automatically is good at large scale

                                                                          bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                                                                          bull Being too aggressive in spinning down VMs can result in poor user experience

                                                                          bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                                                                          Performance Cost

                                                                          Storage Costs

                                                                          bull Understand an applicationrsquos storage profile and how storage billing works

                                                                          bull Make service choices based on your app profile

                                                                          bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                                                                          bull Service choice can make a big cost difference based on your app profile

                                                                          bull Caching and compressing They help a lot with storage costs

                                                                          Saving Bandwidth Costs

                                                                          Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                                                          Sending fewer things over the wire often means getting fewer things from storage

                                                                          Saving bandwidth costs often lead to savings in other places

                                                                          Sending fewer things means your VM has time to do other tasks

                                                                          All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                                                          Compressing Content

                                                                          1 Gzip all output content

                                                                          bull All modern browsers can decompress on the fly

                                                                          bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                                                          2Tradeoff compute costs for storage size

                                                                          3Minimize image sizes

                                                                          bull Use Portable Network Graphics (PNGs)

                                                                          bull Crush your PNGs

                                                                          bull Strip needless metadata

                                                                          bull Make all PNGs palette PNGs

                                                                          Uncompressed

                                                                          Content

                                                                          Compressed

                                                                          Content

                                                                          Gzip

                                                                          Minify JavaScript

                                                                          Minify CCS

                                                                          Minify Images

                                                                          Best Practices Summary

                                                                          Doing lsquolessrsquo is the key to saving costs

                                                                          Measure everything

                                                                          Know your application profile in and out

                                                                          Research Examples in the Cloud

                                                                          hellipon another set of slides

                                                                          Map Reduce on Azure

                                                                          bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                                                          bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                                                          76

                                                                          Project Daytona - Map Reduce on Azure

                                                                          httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                                                          77

                                                                          Questions and Discussionhellip

                                                                          Thank you for hosting me at the Summer School

                                                                          BLAST (Basic Local Alignment Search Tool)

                                                                          bull The most important software in bioinformatics

                                                                          bull Identify similarity between bio-sequences

                                                                          Computationally intensive

                                                                          bull Large number of pairwise alignment operations

                                                                          bull A BLAST running can take 700 ~ 1000 CPU hours

                                                                          bull Sequence databases growing exponentially

                                                                          bull GenBank doubled in size in about 15 months

                                                                          It is easy to parallelize BLAST

                                                                          bull Segment the input

                                                                          bull Segment processing (querying) is pleasingly parallel

                                                                          bull Segment the database (eg mpiBLAST)

                                                                          bull Needs special result reduction processing

                                                                          Large volume data

                                                                          bull A normal Blast database can be as large as 10GB

                                                                          bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                                          bull The output of BLAST is usually 10-100x larger than the input

                                                                          bull Parallel BLAST engine on Azure

                                                                          bull Query-segmentation data-parallel pattern bull split the input sequences

                                                                          bull query partitions in parallel

                                                                          bull merge results together when done

                                                                          bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                                          bull With three special considerations bull Batch job management

                                                                          bull Task parallelism on an elastic Cloud

                                                                          Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                                          on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                                          A simple SplitJoin pattern

                                                                          Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                                          bull 1248 for small middle large and extra large instance size

                                                                          Task granularity bull Large partition load imbalance

                                                                          bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                                          bull Data transferring overhead

                                                                          Best Practice test runs to profiling and set size to mitigate the overhead

                                                                          Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                                          bull too small repeated computation

                                                                          bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                                          bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                                          bull Watch out for the 2-hour maximum limitation

                                                                          BLAST task

                                                                          Splitting task

                                                                          BLAST task

                                                                          BLAST task

                                                                          BLAST task

                                                                          hellip

                                                                          Merging Task

                                                                          Task size vs Performance

                                                                          bull Benefit of the warm cache effect

                                                                          bull 100 sequences per partition is the best choice

                                                                          Instance size vs Performance

                                                                          bull Super-linear speedup with larger size worker instances

                                                                          bull Primarily due to the memory capability

                                                                          Task SizeInstance Size vs Cost

                                                                          bull Extra-large instance generated the best and the most economical throughput

                                                                          bull Fully utilize the resource

                                                                          Web

                                                                          Portal

                                                                          Web

                                                                          Service

                                                                          Job registration

                                                                          Job Scheduler

                                                                          Worker

                                                                          Worker

                                                                          Worker

                                                                          Global

                                                                          dispatch

                                                                          queue

                                                                          Web Role

                                                                          Azure Table

                                                                          Job Management Role

                                                                          Azure Blob

                                                                          Database

                                                                          updating Role

                                                                          hellip

                                                                          Scaling Engine

                                                                          Blast databases

                                                                          temporary data etc)

                                                                          Job Registry NCBI databases

                                                                          BLAST task

                                                                          Splitting task

                                                                          BLAST task

                                                                          BLAST task

                                                                          BLAST task

                                                                          hellip

                                                                          Merging Task

                                                                          ASPNET program hosted by a web role instance bull Submit jobs

                                                                          bull Track jobrsquos status and logs

                                                                          AuthenticationAuthorization based on Live ID

                                                                          The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                                          Web Portal

                                                                          Web Service

                                                                          Job registration

                                                                          Job Scheduler

                                                                          Job Portal

                                                                          Scaling Engine

                                                                          Job Registry

                                                                          R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                                          Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                                          bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                                          AzureBLAST significantly saved computing timehellip

                                                                          Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                                          ldquoAll against Allrdquo query bull The database is also the input query

                                                                          bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                                          bull Theoretically 100 billion sequence comparisons

                                                                          Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                                          bull Would require 3216731 minutes (61 years) on one desktop

                                                                          One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                                          bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                          bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                          bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                          bull Each segment consists of smaller partitions

                                                                          bull When load imbalances redistribute the load manually

                                                                          5

                                                                          0

                                                                          62 6

                                                                          2 6

                                                                          2 6

                                                                          2 6

                                                                          2 5

                                                                          0 62

                                                                          bull Total size of the output result is ~230GB

                                                                          bull The number of total hits is 1764579487

                                                                          bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                          bull But based our estimates real working instance time should be 6~8 day

                                                                          bull Look into log data to analyze what took placehellip

                                                                          5

                                                                          0

                                                                          62 6

                                                                          2 6

                                                                          2 6

                                                                          2 6

                                                                          2 5

                                                                          0 62

                                                                          A normal log record should be

                                                                          Otherwise something is wrong (eg task failed to complete)

                                                                          3312010 614 RD00155D3611B0 Executing the task 251523

                                                                          3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                          3312010 625 RD00155D3611B0 Executing the task 251553

                                                                          3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                          3312010 644 RD00155D3611B0 Executing the task 251600

                                                                          3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                          3312010 822 RD00155D3611B0 Executing the task 251774

                                                                          3312010 950 RD00155D3611B0 Executing the task 251895

                                                                          3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                          North Europe Data Center totally 34256 tasks processed

                                                                          All 62 compute nodes lost tasks

                                                                          and then came back in a group

                                                                          This is an Update domain

                                                                          ~30 mins

                                                                          ~ 6 nodes in one group

                                                                          35 Nodes experience blob

                                                                          writing failure at same

                                                                          time

                                                                          West Europe Datacenter 30976 tasks are completed and job was killed

                                                                          A reasonable guess the

                                                                          Fault Domain is working

                                                                          MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                          You never miss the water till the well has run dry Irish Proverb

                                                                          ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                          Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                          λv = Latent heat of vaporization (Jg)

                                                                          Rn = Net radiation (W m-2)

                                                                          cp = Specific heat capacity of air (J kg-1 K-1)

                                                                          ρa = dry air density (kg m-3)

                                                                          δq = vapor pressure deficit (Pa)

                                                                          ga = Conductivity of air (inverse of ra) (m s-1)

                                                                          gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                          γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                          Estimating resistanceconductivity across a

                                                                          catchment can be tricky

                                                                          bull Lots of inputs big data reduction

                                                                          bull Some of the inputs are not so simple

                                                                          119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                          (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                          Penman-Monteith (1964)

                                                                          Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                          NASA MODIS

                                                                          imagery source

                                                                          archives

                                                                          5 TB (600K files)

                                                                          FLUXNET curated

                                                                          sensor dataset

                                                                          (30GB 960 files)

                                                                          FLUXNET curated

                                                                          field dataset

                                                                          2 KB (1 file)

                                                                          NCEPNCAR

                                                                          ~100MB

                                                                          (4K files)

                                                                          Vegetative

                                                                          clumping

                                                                          ~5MB (1file)

                                                                          Climate

                                                                          classification

                                                                          ~1MB (1file)

                                                                          20 US year = 1 global year

                                                                          Data collection (map) stage

                                                                          bull Downloads requested input tiles

                                                                          from NASA ftp sites

                                                                          bull Includes geospatial lookup for

                                                                          non-sinusoidal tiles that will

                                                                          contribute to a reprojected

                                                                          sinusoidal tile

                                                                          Reprojection (map) stage

                                                                          bull Converts source tile(s) to

                                                                          intermediate result sinusoidal tiles

                                                                          bull Simple nearest neighbor or spline

                                                                          algorithms

                                                                          Derivation reduction stage

                                                                          bull First stage visible to scientist

                                                                          bull Computes ET in our initial use

                                                                          Analysis reduction stage

                                                                          bull Optional second stage visible to

                                                                          scientist

                                                                          bull Enables production of science

                                                                          analysis artifacts such as maps

                                                                          tables virtual sensors

                                                                          Reduction 1

                                                                          Queue

                                                                          Source

                                                                          Metadata

                                                                          AzureMODIS

                                                                          Service Web Role Portal

                                                                          Request

                                                                          Queue

                                                                          Scientific

                                                                          Results

                                                                          Download

                                                                          Data Collection Stage

                                                                          Source Imagery Download Sites

                                                                          Reprojection

                                                                          Queue

                                                                          Reduction 2

                                                                          Queue

                                                                          Download

                                                                          Queue

                                                                          Scientists

                                                                          Science results

                                                                          Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                          httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                          bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                          bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                          bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                          recoverable units of work

                                                                          bull Execution status of all jobs and tasks persisted in Tables

                                                                          ltPipelineStagegt

                                                                          Request

                                                                          hellip ltPipelineStagegtJobStatus

                                                                          Persist ltPipelineStagegtJob Queue

                                                                          MODISAzure Service

                                                                          (Web Role)

                                                                          Service Monitor

                                                                          (Worker Role)

                                                                          Parse amp Persist ltPipelineStagegtTaskStatus

                                                                          hellip

                                                                          Dispatch

                                                                          ltPipelineStagegtTask Queue

                                                                          All work actually done by a Worker Role

                                                                          bull Sandboxes science or other executable

                                                                          bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                          Service Monitor

                                                                          (Worker Role)

                                                                          Parse amp Persist ltPipelineStagegtTaskStatus

                                                                          GenericWorker

                                                                          (Worker Role)

                                                                          hellip

                                                                          hellip

                                                                          Dispatch

                                                                          ltPipelineStagegtTask Queue

                                                                          hellip

                                                                          ltInputgtData Storage

                                                                          bull Dequeues tasks created by the Service Monitor

                                                                          bull Retries failed tasks 3 times

                                                                          bull Maintains all task status

                                                                          Reprojection Request

                                                                          hellip

                                                                          Service Monitor

                                                                          (Worker Role)

                                                                          ReprojectionJobStatus Persist

                                                                          Parse amp Persist ReprojectionTaskStatus

                                                                          GenericWorker

                                                                          (Worker Role)

                                                                          hellip

                                                                          Job Queue

                                                                          hellip

                                                                          Dispatch

                                                                          Task Queue

                                                                          Points to

                                                                          hellip

                                                                          ScanTimeList

                                                                          SwathGranuleMeta

                                                                          Reprojection Data

                                                                          Storage

                                                                          Each entity specifies a

                                                                          single reprojection job

                                                                          request

                                                                          Each entity specifies a

                                                                          single reprojection task (ie

                                                                          a single tile)

                                                                          Query this table to get

                                                                          geo-metadata (eg

                                                                          boundaries) for each swath

                                                                          tile

                                                                          Query this table to get the

                                                                          list of satellite scan times

                                                                          that cover a target tile

                                                                          Swath Source

                                                                          Data Storage

                                                                          bull Computational costs driven by data scale and need to run reduction multiple times

                                                                          bull Storage costs driven by data scale and 6 month project duration

                                                                          bull Small with respect to the people costs even at graduate student rates

                                                                          Reduction 1 Queue

                                                                          Source Metadata

                                                                          Request Queue

                                                                          Scientific Results Download

                                                                          Data Collection Stage

                                                                          Source Imagery Download Sites

                                                                          Reprojection Queue

                                                                          Reduction 2 Queue

                                                                          Download

                                                                          Queue

                                                                          Scientists

                                                                          Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                          400-500 GB

                                                                          60K files

                                                                          10 MBsec

                                                                          11 hours

                                                                          lt10 workers

                                                                          $50 upload

                                                                          $450 storage

                                                                          400 GB

                                                                          45K files

                                                                          3500 hours

                                                                          20-100

                                                                          workers

                                                                          5-7 GB

                                                                          55K files

                                                                          1800 hours

                                                                          20-100

                                                                          workers

                                                                          lt10 GB

                                                                          ~1K files

                                                                          1800 hours

                                                                          20-100

                                                                          workers

                                                                          $420 cpu

                                                                          $60 download

                                                                          $216 cpu

                                                                          $1 download

                                                                          $6 storage

                                                                          $216 cpu

                                                                          $2 download

                                                                          $9 storage

                                                                          AzureMODIS

                                                                          Service Web Role Portal

                                                                          Total $1420

                                                                          bull Clouds are the largest scale computer centers ever constructed and have the

                                                                          potential to be important to both large and small scale science problems

                                                                          bull Equally import they can increase participation in research providing needed

                                                                          resources to userscommunities without ready access

                                                                          bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                          support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                          latency applications do not perform optimally on clouds today

                                                                          bull Provide valuable fault tolerance and scalability abstractions

                                                                          bull Clouds as amplifier for familiar client tools and on premise compute

                                                                          bull Clouds services to support research provide considerable leverage for both

                                                                          individual researchers and entire communities of researchers

                                                                          • Day 2 - Azure as a PaaS
                                                                          • Day 2 - Applications

                                                                            Expect Continuation Tokens ndash Seriously

                                                                            Maximum of 1000 rows in a response

                                                                            At the end of partition range boundary

                                                                            Maximum of 1000 rows in a response

                                                                            At the end of partition range boundary

                                                                            Maximum of 5 seconds to execute the query

                                                                            Tables Recap bullEfficient for frequently used queries

                                                                            bullSupports batch transactions

                                                                            bullDistributes load

                                                                            Select PartitionKey and RowKey that help scale

                                                                            Avoid ldquoAppend onlyrdquo patterns

                                                                            Always Handle continuation tokens

                                                                            ldquoORrdquo predicates are not optimized

                                                                            Implement back-off strategy for retries

                                                                            bullDistribute by using a hash etc as prefix

                                                                            bullExpect continuation tokens for range queries

                                                                            bullExecute the queries that form the ldquoORrdquo predicates as separate queries

                                                                            bullServer busy

                                                                            bullLoad balance partitions to meet traffic needs

                                                                            bullLoad on single partition has exceeded the limits

                                                                            WCF Data Services

                                                                            bullUse a new context for each logical operation

                                                                            bullAddObjectAttachTo can throw exception if entity is already being tracked

                                                                            bullPoint query throws an exception if resource does not exist Use IgnoreResourceNotFoundException

                                                                            Queues Their Unique Role in Building Reliable Scalable Applications

                                                                            bull Want roles that work closely together but are not bound together bull Tight coupling leads to brittleness

                                                                            bull This can aid in scaling and performance

                                                                            bull A queue can hold an unlimited number of messages bull Messages must be serializable as XML

                                                                            bull Limited to 8KB in size

                                                                            bull Commonly use the work ticket pattern

                                                                            bull Why not simply use a table

                                                                            Queue Terminology

                                                                            Message Lifecycle

                                                                            Queue

                                                                            Msg 1

                                                                            Msg 2

                                                                            Msg 3

                                                                            Msg 4

                                                                            Worker Role

                                                                            Worker Role

                                                                            PutMessage

                                                                            Web Role

                                                                            GetMessage (Timeout) RemoveMessage

                                                                            Msg 2 Msg 1

                                                                            Worker Role

                                                                            Msg 2

                                                                            POST httpmyaccountqueuecorewindowsnetmyqueuemessages

                                                                            HTTP11 200 OK Transfer-Encoding chunked Content-Type applicationxml Date Tue 09 Dec 2008 210430 GMT Server Nephos Queue Service Version 10 Microsoft-HTTPAPI20

                                                                            ltxml version=10 encoding=utf-8gt ltQueueMessagesListgt ltQueueMessagegt ltMessageIdgt5974b586-0df3-4e2d-ad0c-18e3892bfca2ltMessageIdgt ltInsertionTimegtMon 22 Sep 2008 232920 GMTltInsertionTimegt ltExpirationTimegtMon 29 Sep 2008 232920 GMTltExpirationTimegt ltPopReceiptgtYzQ4Yzg1MDIGM0MDFiZDAwYzEwltPopReceiptgt ltTimeNextVisiblegtTue 23 Sep 2008 052920GMTltTimeNextVisiblegt ltMessageTextgtPHRlc3Q+dGdGVzdD4=ltMessageTextgt ltQueueMessagegt ltQueueMessagesListgt

                                                                            DELETE httpmyaccountqueuecorewindowsnetmyqueuemessagesmessageidpopreceipt=YzQ4Yzg1MDIGM0MDFiZDAwYzEw

                                                                            Truncated Exponential Back Off Polling

                                                                            Consider a backoff polling approach Each empty poll

                                                                            increases interval by 2x

                                                                            A successful sets the interval back to 1

                                                                            44

                                                                            2 1

                                                                            1 1

                                                                            C1

                                                                            C2

                                                                            Removing Poison Messages

                                                                            1 1

                                                                            2 1

                                                                            3 4 0

                                                                            Producers Consumers

                                                                            P2

                                                                            P1

                                                                            3 0

                                                                            2 GetMessage(Q 30 s) msg 2

                                                                            1 GetMessage(Q 30 s) msg 1

                                                                            1 1

                                                                            2 1

                                                                            1 0

                                                                            2 0

                                                                            45

                                                                            C1

                                                                            C2

                                                                            Removing Poison Messages

                                                                            3 4 0

                                                                            Producers Consumers

                                                                            P2

                                                                            P1

                                                                            1 1

                                                                            2 1

                                                                            2 GetMessage(Q 30 s) msg 2 3 C2 consumed msg 2 4 DeleteMessage(Q msg 2) 7 GetMessage(Q 30 s) msg 1

                                                                            1 GetMessage(Q 30 s) msg 1 5 C1 crashed

                                                                            1 1

                                                                            2 1

                                                                            6 msg1 visible 30 s after Dequeue 3 0

                                                                            1 2

                                                                            1 1

                                                                            1 2

                                                                            46

                                                                            C1

                                                                            C2

                                                                            Removing Poison Messages

                                                                            3 4 0

                                                                            Producers Consumers

                                                                            P2

                                                                            P1

                                                                            1 2

                                                                            2 Dequeue(Q 30 sec) msg 2 3 C2 consumed msg 2 4 Delete(Q msg 2) 7 Dequeue(Q 30 sec) msg 1 8 C2 crashed

                                                                            1 Dequeue(Q 30 sec) msg 1 5 C1 crashed 10 C1 restarted 11 Dequeue(Q 30 sec) msg 1 12 DequeueCount gt 2 13 Delete (Q msg1) 1

                                                                            2

                                                                            6 msg1 visible 30s after Dequeue 9 msg1 visible 30s after Dequeue

                                                                            3 0

                                                                            1 3

                                                                            1 2

                                                                            1 3

                                                                            Queues Recap

                                                                            bullNo need to deal with failures Make message

                                                                            processing idempotent

                                                                            bullInvisible messages result in out of order Do not rely on order

                                                                            bullEnforce threshold on messagersquos dequeue count Use Dequeue count to remove

                                                                            poison messages

                                                                            bullMessages gt 8KB

                                                                            bullBatch messages

                                                                            bullGarbage collect orphaned blobs

                                                                            bullDynamically increasereduce workers

                                                                            Use blob to store message data with

                                                                            reference in message

                                                                            Use message count to scale

                                                                            bullNo need to deal with failures

                                                                            bullInvisible messages result in out of order

                                                                            bullEnforce threshold on messagersquos dequeue count

                                                                            bullDynamically increasereduce workers

                                                                            Windows Azure Storage Takeaways

                                                                            Blobs

                                                                            Drives

                                                                            Tables

                                                                            Queues

                                                                            httpblogsmsdncomwindowsazurestorage

                                                                            httpazurescopecloudappnet

                                                                            49

                                                                            A Quick Exercise

                                                                            hellipThen letrsquos look at some code and some tools

                                                                            50

                                                                            Code ndash AccountInformationcs public class AccountInformation private static string storageKey = ldquotHiSiSnOtMyKeY private static string accountName = jjstore private static StorageCredentialsAccountAndKey credentials internal static StorageCredentialsAccountAndKey Credentials get if (credentials == null) credentials = new StorageCredentialsAccountAndKey(accountName storageKey) return credentials

                                                                            51

                                                                            Code ndash BlobHelpercs public class BlobHelper private static string defaultContainerName = school private CloudBlobClient client = null private CloudBlobContainer container = null private void InitContainer() if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudBlobClient() container = clientGetContainerReference(defaultContainerName) containerCreateIfNotExist() BlobContainerPermissions permissions = containerGetPermissions() permissionsPublicAccess = BlobContainerPublicAccessTypeContainer containerSetPermissions(permissions)

                                                                            52

                                                                            Code ndash BlobHelpercs

                                                                            public void WriteFileToBlob(string filePath) if (client == null || container == null) InitContainer() FileInfo file = new FileInfo(filePath) CloudBlob blob = containerGetBlobReference(fileName) blobPropertiesContentType = GetContentType(fileExtension) blobUploadFile(fileFullName) Or if you want to write a string replace the last line with blobUploadText(someString) And make sure you set the content type to the appropriate MIME type (eg ldquotextplainrdquo)

                                                                            53

                                                                            Code ndash BlobHelpercs

                                                                            public string GetBlobText(string blobName) if (client == null || container == null) InitContainer() CloudBlob blob = containerGetBlobReference(blobName) try return blobDownloadText() catch (Exception) The blob probably does not exist or there is no connection available return null

                                                                            54

                                                                            Application Code - Blobs private void SaveToCloudButton_Click(object sender RoutedEventArgs e) StringBuilder buff = new StringBuilder() buffAppendLine(LastNameFirstNameEmailBirthdayNativeLanguageFavoriteIceCreamYearsInPhDGraduated) foreach (AttendeeEntity attendee in attendees) buffAppendLine(attendeeToCsvString()) blobHelperWriteStringToBlob(SummerSchoolAttendeestxt buffToString())

                                                                            The blob is now available at httpltAccountNamegtblobcorewindowsnetltContainerNamegtltBlobNamegt Or in this case httpjjstoreblobcorewindowsnetschoolSummerSchoolAttendeestxt

                                                                            55

                                                                            Code - TableEntities using MicrosoftWindowsAzureStorageClient public class AttendeeEntity TableServiceEntity public string FirstName get set public string LastName get set public string Email get set public DateTime Birthday get set public string FavoriteIceCream get set public int YearsInPhD get set public bool Graduated get set hellip

                                                                            56

                                                                            Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

                                                                            57

                                                                            Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

                                                                            58

                                                                            Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

                                                                            59

                                                                            Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

                                                                            60

                                                                            Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

                                                                            Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

                                                                            61

                                                                            Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

                                                                            62

                                                                            Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

                                                                            Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

                                                                            63

                                                                            Tools ndash Fiddler2

                                                                            Best Practices

                                                                            Picking the Right VM Size

                                                                            bull Having the correct VM size can make a big difference in costs

                                                                            bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                                                                            bull If you scale better than linear across cores larger VMs could save you money

                                                                            bull Pretty rare to see linear scaling across 8 cores

                                                                            bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                                                                            bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                                                                            Using Your VM to the Maximum

                                                                            Remember bull 1 role instance == 1 VM running Windows

                                                                            bull 1 role instance = one specific task for your code

                                                                            bull Yoursquore paying for the entire VM so why not use it

                                                                            bull Common mistake ndash split up code into multiple roles each not using up CPU

                                                                            bull Balance between using up CPU vs having free capacity in times of need

                                                                            bull Multiple ways to use your CPU to the fullest

                                                                            Exploiting Concurrency

                                                                            bull Spin up additional processes each with a specific task or as a unit of concurrency

                                                                            bull May not be ideal if number of active processes exceeds number of cores

                                                                            bull Use multithreading aggressively

                                                                            bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                                                                            bull In NET 4 use the Task Parallel Library

                                                                            bull Data parallelism

                                                                            bull Task parallelism

                                                                            Finding Good Code Neighbors

                                                                            bull Typically code falls into one or more of these categories

                                                                            bull Find code that is intensive with different resources to live together

                                                                            bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                                                                            Memory Intensive

                                                                            CPU Intensive

                                                                            Network IO Intensive

                                                                            Storage IO Intensive

                                                                            Scaling Appropriately

                                                                            bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                                                                            bull Spinning VMs up and down automatically is good at large scale

                                                                            bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                                                                            bull Being too aggressive in spinning down VMs can result in poor user experience

                                                                            bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                                                                            Performance Cost

                                                                            Storage Costs

                                                                            bull Understand an applicationrsquos storage profile and how storage billing works

                                                                            bull Make service choices based on your app profile

                                                                            bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                                                                            bull Service choice can make a big cost difference based on your app profile

                                                                            bull Caching and compressing They help a lot with storage costs

                                                                            Saving Bandwidth Costs

                                                                            Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                                                            Sending fewer things over the wire often means getting fewer things from storage

                                                                            Saving bandwidth costs often lead to savings in other places

                                                                            Sending fewer things means your VM has time to do other tasks

                                                                            All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                                                            Compressing Content

                                                                            1 Gzip all output content

                                                                            bull All modern browsers can decompress on the fly

                                                                            bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                                                            2Tradeoff compute costs for storage size

                                                                            3Minimize image sizes

                                                                            bull Use Portable Network Graphics (PNGs)

                                                                            bull Crush your PNGs

                                                                            bull Strip needless metadata

                                                                            bull Make all PNGs palette PNGs

                                                                            Uncompressed

                                                                            Content

                                                                            Compressed

                                                                            Content

                                                                            Gzip

                                                                            Minify JavaScript

                                                                            Minify CCS

                                                                            Minify Images

                                                                            Best Practices Summary

                                                                            Doing lsquolessrsquo is the key to saving costs

                                                                            Measure everything

                                                                            Know your application profile in and out

                                                                            Research Examples in the Cloud

                                                                            hellipon another set of slides

                                                                            Map Reduce on Azure

                                                                            bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                                                            bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                                                            76

                                                                            Project Daytona - Map Reduce on Azure

                                                                            httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                                                            77

                                                                            Questions and Discussionhellip

                                                                            Thank you for hosting me at the Summer School

                                                                            BLAST (Basic Local Alignment Search Tool)

                                                                            bull The most important software in bioinformatics

                                                                            bull Identify similarity between bio-sequences

                                                                            Computationally intensive

                                                                            bull Large number of pairwise alignment operations

                                                                            bull A BLAST running can take 700 ~ 1000 CPU hours

                                                                            bull Sequence databases growing exponentially

                                                                            bull GenBank doubled in size in about 15 months

                                                                            It is easy to parallelize BLAST

                                                                            bull Segment the input

                                                                            bull Segment processing (querying) is pleasingly parallel

                                                                            bull Segment the database (eg mpiBLAST)

                                                                            bull Needs special result reduction processing

                                                                            Large volume data

                                                                            bull A normal Blast database can be as large as 10GB

                                                                            bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                                            bull The output of BLAST is usually 10-100x larger than the input

                                                                            bull Parallel BLAST engine on Azure

                                                                            bull Query-segmentation data-parallel pattern bull split the input sequences

                                                                            bull query partitions in parallel

                                                                            bull merge results together when done

                                                                            bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                                            bull With three special considerations bull Batch job management

                                                                            bull Task parallelism on an elastic Cloud

                                                                            Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                                            on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                                            A simple SplitJoin pattern

                                                                            Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                                            bull 1248 for small middle large and extra large instance size

                                                                            Task granularity bull Large partition load imbalance

                                                                            bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                                            bull Data transferring overhead

                                                                            Best Practice test runs to profiling and set size to mitigate the overhead

                                                                            Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                                            bull too small repeated computation

                                                                            bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                                            bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                                            bull Watch out for the 2-hour maximum limitation

                                                                            BLAST task

                                                                            Splitting task

                                                                            BLAST task

                                                                            BLAST task

                                                                            BLAST task

                                                                            hellip

                                                                            Merging Task

                                                                            Task size vs Performance

                                                                            bull Benefit of the warm cache effect

                                                                            bull 100 sequences per partition is the best choice

                                                                            Instance size vs Performance

                                                                            bull Super-linear speedup with larger size worker instances

                                                                            bull Primarily due to the memory capability

                                                                            Task SizeInstance Size vs Cost

                                                                            bull Extra-large instance generated the best and the most economical throughput

                                                                            bull Fully utilize the resource

                                                                            Web

                                                                            Portal

                                                                            Web

                                                                            Service

                                                                            Job registration

                                                                            Job Scheduler

                                                                            Worker

                                                                            Worker

                                                                            Worker

                                                                            Global

                                                                            dispatch

                                                                            queue

                                                                            Web Role

                                                                            Azure Table

                                                                            Job Management Role

                                                                            Azure Blob

                                                                            Database

                                                                            updating Role

                                                                            hellip

                                                                            Scaling Engine

                                                                            Blast databases

                                                                            temporary data etc)

                                                                            Job Registry NCBI databases

                                                                            BLAST task

                                                                            Splitting task

                                                                            BLAST task

                                                                            BLAST task

                                                                            BLAST task

                                                                            hellip

                                                                            Merging Task

                                                                            ASPNET program hosted by a web role instance bull Submit jobs

                                                                            bull Track jobrsquos status and logs

                                                                            AuthenticationAuthorization based on Live ID

                                                                            The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                                            Web Portal

                                                                            Web Service

                                                                            Job registration

                                                                            Job Scheduler

                                                                            Job Portal

                                                                            Scaling Engine

                                                                            Job Registry

                                                                            R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                                            Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                                            bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                                            AzureBLAST significantly saved computing timehellip

                                                                            Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                                            ldquoAll against Allrdquo query bull The database is also the input query

                                                                            bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                                            bull Theoretically 100 billion sequence comparisons

                                                                            Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                                            bull Would require 3216731 minutes (61 years) on one desktop

                                                                            One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                                            bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                            bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                            bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                            bull Each segment consists of smaller partitions

                                                                            bull When load imbalances redistribute the load manually

                                                                            5

                                                                            0

                                                                            62 6

                                                                            2 6

                                                                            2 6

                                                                            2 6

                                                                            2 5

                                                                            0 62

                                                                            bull Total size of the output result is ~230GB

                                                                            bull The number of total hits is 1764579487

                                                                            bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                            bull But based our estimates real working instance time should be 6~8 day

                                                                            bull Look into log data to analyze what took placehellip

                                                                            5

                                                                            0

                                                                            62 6

                                                                            2 6

                                                                            2 6

                                                                            2 6

                                                                            2 5

                                                                            0 62

                                                                            A normal log record should be

                                                                            Otherwise something is wrong (eg task failed to complete)

                                                                            3312010 614 RD00155D3611B0 Executing the task 251523

                                                                            3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                            3312010 625 RD00155D3611B0 Executing the task 251553

                                                                            3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                            3312010 644 RD00155D3611B0 Executing the task 251600

                                                                            3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                            3312010 822 RD00155D3611B0 Executing the task 251774

                                                                            3312010 950 RD00155D3611B0 Executing the task 251895

                                                                            3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                            North Europe Data Center totally 34256 tasks processed

                                                                            All 62 compute nodes lost tasks

                                                                            and then came back in a group

                                                                            This is an Update domain

                                                                            ~30 mins

                                                                            ~ 6 nodes in one group

                                                                            35 Nodes experience blob

                                                                            writing failure at same

                                                                            time

                                                                            West Europe Datacenter 30976 tasks are completed and job was killed

                                                                            A reasonable guess the

                                                                            Fault Domain is working

                                                                            MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                            You never miss the water till the well has run dry Irish Proverb

                                                                            ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                            Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                            λv = Latent heat of vaporization (Jg)

                                                                            Rn = Net radiation (W m-2)

                                                                            cp = Specific heat capacity of air (J kg-1 K-1)

                                                                            ρa = dry air density (kg m-3)

                                                                            δq = vapor pressure deficit (Pa)

                                                                            ga = Conductivity of air (inverse of ra) (m s-1)

                                                                            gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                            γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                            Estimating resistanceconductivity across a

                                                                            catchment can be tricky

                                                                            bull Lots of inputs big data reduction

                                                                            bull Some of the inputs are not so simple

                                                                            119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                            (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                            Penman-Monteith (1964)

                                                                            Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                            NASA MODIS

                                                                            imagery source

                                                                            archives

                                                                            5 TB (600K files)

                                                                            FLUXNET curated

                                                                            sensor dataset

                                                                            (30GB 960 files)

                                                                            FLUXNET curated

                                                                            field dataset

                                                                            2 KB (1 file)

                                                                            NCEPNCAR

                                                                            ~100MB

                                                                            (4K files)

                                                                            Vegetative

                                                                            clumping

                                                                            ~5MB (1file)

                                                                            Climate

                                                                            classification

                                                                            ~1MB (1file)

                                                                            20 US year = 1 global year

                                                                            Data collection (map) stage

                                                                            bull Downloads requested input tiles

                                                                            from NASA ftp sites

                                                                            bull Includes geospatial lookup for

                                                                            non-sinusoidal tiles that will

                                                                            contribute to a reprojected

                                                                            sinusoidal tile

                                                                            Reprojection (map) stage

                                                                            bull Converts source tile(s) to

                                                                            intermediate result sinusoidal tiles

                                                                            bull Simple nearest neighbor or spline

                                                                            algorithms

                                                                            Derivation reduction stage

                                                                            bull First stage visible to scientist

                                                                            bull Computes ET in our initial use

                                                                            Analysis reduction stage

                                                                            bull Optional second stage visible to

                                                                            scientist

                                                                            bull Enables production of science

                                                                            analysis artifacts such as maps

                                                                            tables virtual sensors

                                                                            Reduction 1

                                                                            Queue

                                                                            Source

                                                                            Metadata

                                                                            AzureMODIS

                                                                            Service Web Role Portal

                                                                            Request

                                                                            Queue

                                                                            Scientific

                                                                            Results

                                                                            Download

                                                                            Data Collection Stage

                                                                            Source Imagery Download Sites

                                                                            Reprojection

                                                                            Queue

                                                                            Reduction 2

                                                                            Queue

                                                                            Download

                                                                            Queue

                                                                            Scientists

                                                                            Science results

                                                                            Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                            httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                            bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                            bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                            bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                            recoverable units of work

                                                                            bull Execution status of all jobs and tasks persisted in Tables

                                                                            ltPipelineStagegt

                                                                            Request

                                                                            hellip ltPipelineStagegtJobStatus

                                                                            Persist ltPipelineStagegtJob Queue

                                                                            MODISAzure Service

                                                                            (Web Role)

                                                                            Service Monitor

                                                                            (Worker Role)

                                                                            Parse amp Persist ltPipelineStagegtTaskStatus

                                                                            hellip

                                                                            Dispatch

                                                                            ltPipelineStagegtTask Queue

                                                                            All work actually done by a Worker Role

                                                                            bull Sandboxes science or other executable

                                                                            bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                            Service Monitor

                                                                            (Worker Role)

                                                                            Parse amp Persist ltPipelineStagegtTaskStatus

                                                                            GenericWorker

                                                                            (Worker Role)

                                                                            hellip

                                                                            hellip

                                                                            Dispatch

                                                                            ltPipelineStagegtTask Queue

                                                                            hellip

                                                                            ltInputgtData Storage

                                                                            bull Dequeues tasks created by the Service Monitor

                                                                            bull Retries failed tasks 3 times

                                                                            bull Maintains all task status

                                                                            Reprojection Request

                                                                            hellip

                                                                            Service Monitor

                                                                            (Worker Role)

                                                                            ReprojectionJobStatus Persist

                                                                            Parse amp Persist ReprojectionTaskStatus

                                                                            GenericWorker

                                                                            (Worker Role)

                                                                            hellip

                                                                            Job Queue

                                                                            hellip

                                                                            Dispatch

                                                                            Task Queue

                                                                            Points to

                                                                            hellip

                                                                            ScanTimeList

                                                                            SwathGranuleMeta

                                                                            Reprojection Data

                                                                            Storage

                                                                            Each entity specifies a

                                                                            single reprojection job

                                                                            request

                                                                            Each entity specifies a

                                                                            single reprojection task (ie

                                                                            a single tile)

                                                                            Query this table to get

                                                                            geo-metadata (eg

                                                                            boundaries) for each swath

                                                                            tile

                                                                            Query this table to get the

                                                                            list of satellite scan times

                                                                            that cover a target tile

                                                                            Swath Source

                                                                            Data Storage

                                                                            bull Computational costs driven by data scale and need to run reduction multiple times

                                                                            bull Storage costs driven by data scale and 6 month project duration

                                                                            bull Small with respect to the people costs even at graduate student rates

                                                                            Reduction 1 Queue

                                                                            Source Metadata

                                                                            Request Queue

                                                                            Scientific Results Download

                                                                            Data Collection Stage

                                                                            Source Imagery Download Sites

                                                                            Reprojection Queue

                                                                            Reduction 2 Queue

                                                                            Download

                                                                            Queue

                                                                            Scientists

                                                                            Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                            400-500 GB

                                                                            60K files

                                                                            10 MBsec

                                                                            11 hours

                                                                            lt10 workers

                                                                            $50 upload

                                                                            $450 storage

                                                                            400 GB

                                                                            45K files

                                                                            3500 hours

                                                                            20-100

                                                                            workers

                                                                            5-7 GB

                                                                            55K files

                                                                            1800 hours

                                                                            20-100

                                                                            workers

                                                                            lt10 GB

                                                                            ~1K files

                                                                            1800 hours

                                                                            20-100

                                                                            workers

                                                                            $420 cpu

                                                                            $60 download

                                                                            $216 cpu

                                                                            $1 download

                                                                            $6 storage

                                                                            $216 cpu

                                                                            $2 download

                                                                            $9 storage

                                                                            AzureMODIS

                                                                            Service Web Role Portal

                                                                            Total $1420

                                                                            bull Clouds are the largest scale computer centers ever constructed and have the

                                                                            potential to be important to both large and small scale science problems

                                                                            bull Equally import they can increase participation in research providing needed

                                                                            resources to userscommunities without ready access

                                                                            bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                            support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                            latency applications do not perform optimally on clouds today

                                                                            bull Provide valuable fault tolerance and scalability abstractions

                                                                            bull Clouds as amplifier for familiar client tools and on premise compute

                                                                            bull Clouds services to support research provide considerable leverage for both

                                                                            individual researchers and entire communities of researchers

                                                                            • Day 2 - Azure as a PaaS
                                                                            • Day 2 - Applications

                                                                              Tables Recap bullEfficient for frequently used queries

                                                                              bullSupports batch transactions

                                                                              bullDistributes load

                                                                              Select PartitionKey and RowKey that help scale

                                                                              Avoid ldquoAppend onlyrdquo patterns

                                                                              Always Handle continuation tokens

                                                                              ldquoORrdquo predicates are not optimized

                                                                              Implement back-off strategy for retries

                                                                              bullDistribute by using a hash etc as prefix

                                                                              bullExpect continuation tokens for range queries

                                                                              bullExecute the queries that form the ldquoORrdquo predicates as separate queries

                                                                              bullServer busy

                                                                              bullLoad balance partitions to meet traffic needs

                                                                              bullLoad on single partition has exceeded the limits

                                                                              WCF Data Services

                                                                              bullUse a new context for each logical operation

                                                                              bullAddObjectAttachTo can throw exception if entity is already being tracked

                                                                              bullPoint query throws an exception if resource does not exist Use IgnoreResourceNotFoundException

                                                                              Queues Their Unique Role in Building Reliable Scalable Applications

                                                                              bull Want roles that work closely together but are not bound together bull Tight coupling leads to brittleness

                                                                              bull This can aid in scaling and performance

                                                                              bull A queue can hold an unlimited number of messages bull Messages must be serializable as XML

                                                                              bull Limited to 8KB in size

                                                                              bull Commonly use the work ticket pattern

                                                                              bull Why not simply use a table

                                                                              Queue Terminology

                                                                              Message Lifecycle

                                                                              Queue

                                                                              Msg 1

                                                                              Msg 2

                                                                              Msg 3

                                                                              Msg 4

                                                                              Worker Role

                                                                              Worker Role

                                                                              PutMessage

                                                                              Web Role

                                                                              GetMessage (Timeout) RemoveMessage

                                                                              Msg 2 Msg 1

                                                                              Worker Role

                                                                              Msg 2

                                                                              POST httpmyaccountqueuecorewindowsnetmyqueuemessages

                                                                              HTTP11 200 OK Transfer-Encoding chunked Content-Type applicationxml Date Tue 09 Dec 2008 210430 GMT Server Nephos Queue Service Version 10 Microsoft-HTTPAPI20

                                                                              ltxml version=10 encoding=utf-8gt ltQueueMessagesListgt ltQueueMessagegt ltMessageIdgt5974b586-0df3-4e2d-ad0c-18e3892bfca2ltMessageIdgt ltInsertionTimegtMon 22 Sep 2008 232920 GMTltInsertionTimegt ltExpirationTimegtMon 29 Sep 2008 232920 GMTltExpirationTimegt ltPopReceiptgtYzQ4Yzg1MDIGM0MDFiZDAwYzEwltPopReceiptgt ltTimeNextVisiblegtTue 23 Sep 2008 052920GMTltTimeNextVisiblegt ltMessageTextgtPHRlc3Q+dGdGVzdD4=ltMessageTextgt ltQueueMessagegt ltQueueMessagesListgt

                                                                              DELETE httpmyaccountqueuecorewindowsnetmyqueuemessagesmessageidpopreceipt=YzQ4Yzg1MDIGM0MDFiZDAwYzEw

                                                                              Truncated Exponential Back Off Polling

                                                                              Consider a backoff polling approach Each empty poll

                                                                              increases interval by 2x

                                                                              A successful sets the interval back to 1

                                                                              44

                                                                              2 1

                                                                              1 1

                                                                              C1

                                                                              C2

                                                                              Removing Poison Messages

                                                                              1 1

                                                                              2 1

                                                                              3 4 0

                                                                              Producers Consumers

                                                                              P2

                                                                              P1

                                                                              3 0

                                                                              2 GetMessage(Q 30 s) msg 2

                                                                              1 GetMessage(Q 30 s) msg 1

                                                                              1 1

                                                                              2 1

                                                                              1 0

                                                                              2 0

                                                                              45

                                                                              C1

                                                                              C2

                                                                              Removing Poison Messages

                                                                              3 4 0

                                                                              Producers Consumers

                                                                              P2

                                                                              P1

                                                                              1 1

                                                                              2 1

                                                                              2 GetMessage(Q 30 s) msg 2 3 C2 consumed msg 2 4 DeleteMessage(Q msg 2) 7 GetMessage(Q 30 s) msg 1

                                                                              1 GetMessage(Q 30 s) msg 1 5 C1 crashed

                                                                              1 1

                                                                              2 1

                                                                              6 msg1 visible 30 s after Dequeue 3 0

                                                                              1 2

                                                                              1 1

                                                                              1 2

                                                                              46

                                                                              C1

                                                                              C2

                                                                              Removing Poison Messages

                                                                              3 4 0

                                                                              Producers Consumers

                                                                              P2

                                                                              P1

                                                                              1 2

                                                                              2 Dequeue(Q 30 sec) msg 2 3 C2 consumed msg 2 4 Delete(Q msg 2) 7 Dequeue(Q 30 sec) msg 1 8 C2 crashed

                                                                              1 Dequeue(Q 30 sec) msg 1 5 C1 crashed 10 C1 restarted 11 Dequeue(Q 30 sec) msg 1 12 DequeueCount gt 2 13 Delete (Q msg1) 1

                                                                              2

                                                                              6 msg1 visible 30s after Dequeue 9 msg1 visible 30s after Dequeue

                                                                              3 0

                                                                              1 3

                                                                              1 2

                                                                              1 3

                                                                              Queues Recap

                                                                              bullNo need to deal with failures Make message

                                                                              processing idempotent

                                                                              bullInvisible messages result in out of order Do not rely on order

                                                                              bullEnforce threshold on messagersquos dequeue count Use Dequeue count to remove

                                                                              poison messages

                                                                              bullMessages gt 8KB

                                                                              bullBatch messages

                                                                              bullGarbage collect orphaned blobs

                                                                              bullDynamically increasereduce workers

                                                                              Use blob to store message data with

                                                                              reference in message

                                                                              Use message count to scale

                                                                              bullNo need to deal with failures

                                                                              bullInvisible messages result in out of order

                                                                              bullEnforce threshold on messagersquos dequeue count

                                                                              bullDynamically increasereduce workers

                                                                              Windows Azure Storage Takeaways

                                                                              Blobs

                                                                              Drives

                                                                              Tables

                                                                              Queues

                                                                              httpblogsmsdncomwindowsazurestorage

                                                                              httpazurescopecloudappnet

                                                                              49

                                                                              A Quick Exercise

                                                                              hellipThen letrsquos look at some code and some tools

                                                                              50

                                                                              Code ndash AccountInformationcs public class AccountInformation private static string storageKey = ldquotHiSiSnOtMyKeY private static string accountName = jjstore private static StorageCredentialsAccountAndKey credentials internal static StorageCredentialsAccountAndKey Credentials get if (credentials == null) credentials = new StorageCredentialsAccountAndKey(accountName storageKey) return credentials

                                                                              51

                                                                              Code ndash BlobHelpercs public class BlobHelper private static string defaultContainerName = school private CloudBlobClient client = null private CloudBlobContainer container = null private void InitContainer() if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudBlobClient() container = clientGetContainerReference(defaultContainerName) containerCreateIfNotExist() BlobContainerPermissions permissions = containerGetPermissions() permissionsPublicAccess = BlobContainerPublicAccessTypeContainer containerSetPermissions(permissions)

                                                                              52

                                                                              Code ndash BlobHelpercs

                                                                              public void WriteFileToBlob(string filePath) if (client == null || container == null) InitContainer() FileInfo file = new FileInfo(filePath) CloudBlob blob = containerGetBlobReference(fileName) blobPropertiesContentType = GetContentType(fileExtension) blobUploadFile(fileFullName) Or if you want to write a string replace the last line with blobUploadText(someString) And make sure you set the content type to the appropriate MIME type (eg ldquotextplainrdquo)

                                                                              53

                                                                              Code ndash BlobHelpercs

                                                                              public string GetBlobText(string blobName) if (client == null || container == null) InitContainer() CloudBlob blob = containerGetBlobReference(blobName) try return blobDownloadText() catch (Exception) The blob probably does not exist or there is no connection available return null

                                                                              54

                                                                              Application Code - Blobs private void SaveToCloudButton_Click(object sender RoutedEventArgs e) StringBuilder buff = new StringBuilder() buffAppendLine(LastNameFirstNameEmailBirthdayNativeLanguageFavoriteIceCreamYearsInPhDGraduated) foreach (AttendeeEntity attendee in attendees) buffAppendLine(attendeeToCsvString()) blobHelperWriteStringToBlob(SummerSchoolAttendeestxt buffToString())

                                                                              The blob is now available at httpltAccountNamegtblobcorewindowsnetltContainerNamegtltBlobNamegt Or in this case httpjjstoreblobcorewindowsnetschoolSummerSchoolAttendeestxt

                                                                              55

                                                                              Code - TableEntities using MicrosoftWindowsAzureStorageClient public class AttendeeEntity TableServiceEntity public string FirstName get set public string LastName get set public string Email get set public DateTime Birthday get set public string FavoriteIceCream get set public int YearsInPhD get set public bool Graduated get set hellip

                                                                              56

                                                                              Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

                                                                              57

                                                                              Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

                                                                              58

                                                                              Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

                                                                              59

                                                                              Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

                                                                              60

                                                                              Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

                                                                              Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

                                                                              61

                                                                              Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

                                                                              62

                                                                              Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

                                                                              Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

                                                                              63

                                                                              Tools ndash Fiddler2

                                                                              Best Practices

                                                                              Picking the Right VM Size

                                                                              bull Having the correct VM size can make a big difference in costs

                                                                              bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                                                                              bull If you scale better than linear across cores larger VMs could save you money

                                                                              bull Pretty rare to see linear scaling across 8 cores

                                                                              bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                                                                              bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                                                                              Using Your VM to the Maximum

                                                                              Remember bull 1 role instance == 1 VM running Windows

                                                                              bull 1 role instance = one specific task for your code

                                                                              bull Yoursquore paying for the entire VM so why not use it

                                                                              bull Common mistake ndash split up code into multiple roles each not using up CPU

                                                                              bull Balance between using up CPU vs having free capacity in times of need

                                                                              bull Multiple ways to use your CPU to the fullest

                                                                              Exploiting Concurrency

                                                                              bull Spin up additional processes each with a specific task or as a unit of concurrency

                                                                              bull May not be ideal if number of active processes exceeds number of cores

                                                                              bull Use multithreading aggressively

                                                                              bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                                                                              bull In NET 4 use the Task Parallel Library

                                                                              bull Data parallelism

                                                                              bull Task parallelism

                                                                              Finding Good Code Neighbors

                                                                              bull Typically code falls into one or more of these categories

                                                                              bull Find code that is intensive with different resources to live together

                                                                              bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                                                                              Memory Intensive

                                                                              CPU Intensive

                                                                              Network IO Intensive

                                                                              Storage IO Intensive

                                                                              Scaling Appropriately

                                                                              bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                                                                              bull Spinning VMs up and down automatically is good at large scale

                                                                              bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                                                                              bull Being too aggressive in spinning down VMs can result in poor user experience

                                                                              bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                                                                              Performance Cost

                                                                              Storage Costs

                                                                              bull Understand an applicationrsquos storage profile and how storage billing works

                                                                              bull Make service choices based on your app profile

                                                                              bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                                                                              bull Service choice can make a big cost difference based on your app profile

                                                                              bull Caching and compressing They help a lot with storage costs

                                                                              Saving Bandwidth Costs

                                                                              Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                                                              Sending fewer things over the wire often means getting fewer things from storage

                                                                              Saving bandwidth costs often lead to savings in other places

                                                                              Sending fewer things means your VM has time to do other tasks

                                                                              All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                                                              Compressing Content

                                                                              1 Gzip all output content

                                                                              bull All modern browsers can decompress on the fly

                                                                              bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                                                              2Tradeoff compute costs for storage size

                                                                              3Minimize image sizes

                                                                              bull Use Portable Network Graphics (PNGs)

                                                                              bull Crush your PNGs

                                                                              bull Strip needless metadata

                                                                              bull Make all PNGs palette PNGs

                                                                              Uncompressed

                                                                              Content

                                                                              Compressed

                                                                              Content

                                                                              Gzip

                                                                              Minify JavaScript

                                                                              Minify CCS

                                                                              Minify Images

                                                                              Best Practices Summary

                                                                              Doing lsquolessrsquo is the key to saving costs

                                                                              Measure everything

                                                                              Know your application profile in and out

                                                                              Research Examples in the Cloud

                                                                              hellipon another set of slides

                                                                              Map Reduce on Azure

                                                                              bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                                                              bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                                                              76

                                                                              Project Daytona - Map Reduce on Azure

                                                                              httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                                                              77

                                                                              Questions and Discussionhellip

                                                                              Thank you for hosting me at the Summer School

                                                                              BLAST (Basic Local Alignment Search Tool)

                                                                              bull The most important software in bioinformatics

                                                                              bull Identify similarity between bio-sequences

                                                                              Computationally intensive

                                                                              bull Large number of pairwise alignment operations

                                                                              bull A BLAST running can take 700 ~ 1000 CPU hours

                                                                              bull Sequence databases growing exponentially

                                                                              bull GenBank doubled in size in about 15 months

                                                                              It is easy to parallelize BLAST

                                                                              bull Segment the input

                                                                              bull Segment processing (querying) is pleasingly parallel

                                                                              bull Segment the database (eg mpiBLAST)

                                                                              bull Needs special result reduction processing

                                                                              Large volume data

                                                                              bull A normal Blast database can be as large as 10GB

                                                                              bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                                              bull The output of BLAST is usually 10-100x larger than the input

                                                                              bull Parallel BLAST engine on Azure

                                                                              bull Query-segmentation data-parallel pattern bull split the input sequences

                                                                              bull query partitions in parallel

                                                                              bull merge results together when done

                                                                              bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                                              bull With three special considerations bull Batch job management

                                                                              bull Task parallelism on an elastic Cloud

                                                                              Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                                              on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                                              A simple SplitJoin pattern

                                                                              Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                                              bull 1248 for small middle large and extra large instance size

                                                                              Task granularity bull Large partition load imbalance

                                                                              bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                                              bull Data transferring overhead

                                                                              Best Practice test runs to profiling and set size to mitigate the overhead

                                                                              Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                                              bull too small repeated computation

                                                                              bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                                              bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                                              bull Watch out for the 2-hour maximum limitation

                                                                              BLAST task

                                                                              Splitting task

                                                                              BLAST task

                                                                              BLAST task

                                                                              BLAST task

                                                                              hellip

                                                                              Merging Task

                                                                              Task size vs Performance

                                                                              bull Benefit of the warm cache effect

                                                                              bull 100 sequences per partition is the best choice

                                                                              Instance size vs Performance

                                                                              bull Super-linear speedup with larger size worker instances

                                                                              bull Primarily due to the memory capability

                                                                              Task SizeInstance Size vs Cost

                                                                              bull Extra-large instance generated the best and the most economical throughput

                                                                              bull Fully utilize the resource

                                                                              Web

                                                                              Portal

                                                                              Web

                                                                              Service

                                                                              Job registration

                                                                              Job Scheduler

                                                                              Worker

                                                                              Worker

                                                                              Worker

                                                                              Global

                                                                              dispatch

                                                                              queue

                                                                              Web Role

                                                                              Azure Table

                                                                              Job Management Role

                                                                              Azure Blob

                                                                              Database

                                                                              updating Role

                                                                              hellip

                                                                              Scaling Engine

                                                                              Blast databases

                                                                              temporary data etc)

                                                                              Job Registry NCBI databases

                                                                              BLAST task

                                                                              Splitting task

                                                                              BLAST task

                                                                              BLAST task

                                                                              BLAST task

                                                                              hellip

                                                                              Merging Task

                                                                              ASPNET program hosted by a web role instance bull Submit jobs

                                                                              bull Track jobrsquos status and logs

                                                                              AuthenticationAuthorization based on Live ID

                                                                              The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                                              Web Portal

                                                                              Web Service

                                                                              Job registration

                                                                              Job Scheduler

                                                                              Job Portal

                                                                              Scaling Engine

                                                                              Job Registry

                                                                              R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                                              Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                                              bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                                              AzureBLAST significantly saved computing timehellip

                                                                              Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                                              ldquoAll against Allrdquo query bull The database is also the input query

                                                                              bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                                              bull Theoretically 100 billion sequence comparisons

                                                                              Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                                              bull Would require 3216731 minutes (61 years) on one desktop

                                                                              One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                                              bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                              bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                              bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                              bull Each segment consists of smaller partitions

                                                                              bull When load imbalances redistribute the load manually

                                                                              5

                                                                              0

                                                                              62 6

                                                                              2 6

                                                                              2 6

                                                                              2 6

                                                                              2 5

                                                                              0 62

                                                                              bull Total size of the output result is ~230GB

                                                                              bull The number of total hits is 1764579487

                                                                              bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                              bull But based our estimates real working instance time should be 6~8 day

                                                                              bull Look into log data to analyze what took placehellip

                                                                              5

                                                                              0

                                                                              62 6

                                                                              2 6

                                                                              2 6

                                                                              2 6

                                                                              2 5

                                                                              0 62

                                                                              A normal log record should be

                                                                              Otherwise something is wrong (eg task failed to complete)

                                                                              3312010 614 RD00155D3611B0 Executing the task 251523

                                                                              3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                              3312010 625 RD00155D3611B0 Executing the task 251553

                                                                              3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                              3312010 644 RD00155D3611B0 Executing the task 251600

                                                                              3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                              3312010 822 RD00155D3611B0 Executing the task 251774

                                                                              3312010 950 RD00155D3611B0 Executing the task 251895

                                                                              3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                              North Europe Data Center totally 34256 tasks processed

                                                                              All 62 compute nodes lost tasks

                                                                              and then came back in a group

                                                                              This is an Update domain

                                                                              ~30 mins

                                                                              ~ 6 nodes in one group

                                                                              35 Nodes experience blob

                                                                              writing failure at same

                                                                              time

                                                                              West Europe Datacenter 30976 tasks are completed and job was killed

                                                                              A reasonable guess the

                                                                              Fault Domain is working

                                                                              MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                              You never miss the water till the well has run dry Irish Proverb

                                                                              ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                              Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                              λv = Latent heat of vaporization (Jg)

                                                                              Rn = Net radiation (W m-2)

                                                                              cp = Specific heat capacity of air (J kg-1 K-1)

                                                                              ρa = dry air density (kg m-3)

                                                                              δq = vapor pressure deficit (Pa)

                                                                              ga = Conductivity of air (inverse of ra) (m s-1)

                                                                              gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                              γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                              Estimating resistanceconductivity across a

                                                                              catchment can be tricky

                                                                              bull Lots of inputs big data reduction

                                                                              bull Some of the inputs are not so simple

                                                                              119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                              (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                              Penman-Monteith (1964)

                                                                              Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                              NASA MODIS

                                                                              imagery source

                                                                              archives

                                                                              5 TB (600K files)

                                                                              FLUXNET curated

                                                                              sensor dataset

                                                                              (30GB 960 files)

                                                                              FLUXNET curated

                                                                              field dataset

                                                                              2 KB (1 file)

                                                                              NCEPNCAR

                                                                              ~100MB

                                                                              (4K files)

                                                                              Vegetative

                                                                              clumping

                                                                              ~5MB (1file)

                                                                              Climate

                                                                              classification

                                                                              ~1MB (1file)

                                                                              20 US year = 1 global year

                                                                              Data collection (map) stage

                                                                              bull Downloads requested input tiles

                                                                              from NASA ftp sites

                                                                              bull Includes geospatial lookup for

                                                                              non-sinusoidal tiles that will

                                                                              contribute to a reprojected

                                                                              sinusoidal tile

                                                                              Reprojection (map) stage

                                                                              bull Converts source tile(s) to

                                                                              intermediate result sinusoidal tiles

                                                                              bull Simple nearest neighbor or spline

                                                                              algorithms

                                                                              Derivation reduction stage

                                                                              bull First stage visible to scientist

                                                                              bull Computes ET in our initial use

                                                                              Analysis reduction stage

                                                                              bull Optional second stage visible to

                                                                              scientist

                                                                              bull Enables production of science

                                                                              analysis artifacts such as maps

                                                                              tables virtual sensors

                                                                              Reduction 1

                                                                              Queue

                                                                              Source

                                                                              Metadata

                                                                              AzureMODIS

                                                                              Service Web Role Portal

                                                                              Request

                                                                              Queue

                                                                              Scientific

                                                                              Results

                                                                              Download

                                                                              Data Collection Stage

                                                                              Source Imagery Download Sites

                                                                              Reprojection

                                                                              Queue

                                                                              Reduction 2

                                                                              Queue

                                                                              Download

                                                                              Queue

                                                                              Scientists

                                                                              Science results

                                                                              Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                              httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                              bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                              bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                              bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                              recoverable units of work

                                                                              bull Execution status of all jobs and tasks persisted in Tables

                                                                              ltPipelineStagegt

                                                                              Request

                                                                              hellip ltPipelineStagegtJobStatus

                                                                              Persist ltPipelineStagegtJob Queue

                                                                              MODISAzure Service

                                                                              (Web Role)

                                                                              Service Monitor

                                                                              (Worker Role)

                                                                              Parse amp Persist ltPipelineStagegtTaskStatus

                                                                              hellip

                                                                              Dispatch

                                                                              ltPipelineStagegtTask Queue

                                                                              All work actually done by a Worker Role

                                                                              bull Sandboxes science or other executable

                                                                              bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                              Service Monitor

                                                                              (Worker Role)

                                                                              Parse amp Persist ltPipelineStagegtTaskStatus

                                                                              GenericWorker

                                                                              (Worker Role)

                                                                              hellip

                                                                              hellip

                                                                              Dispatch

                                                                              ltPipelineStagegtTask Queue

                                                                              hellip

                                                                              ltInputgtData Storage

                                                                              bull Dequeues tasks created by the Service Monitor

                                                                              bull Retries failed tasks 3 times

                                                                              bull Maintains all task status

                                                                              Reprojection Request

                                                                              hellip

                                                                              Service Monitor

                                                                              (Worker Role)

                                                                              ReprojectionJobStatus Persist

                                                                              Parse amp Persist ReprojectionTaskStatus

                                                                              GenericWorker

                                                                              (Worker Role)

                                                                              hellip

                                                                              Job Queue

                                                                              hellip

                                                                              Dispatch

                                                                              Task Queue

                                                                              Points to

                                                                              hellip

                                                                              ScanTimeList

                                                                              SwathGranuleMeta

                                                                              Reprojection Data

                                                                              Storage

                                                                              Each entity specifies a

                                                                              single reprojection job

                                                                              request

                                                                              Each entity specifies a

                                                                              single reprojection task (ie

                                                                              a single tile)

                                                                              Query this table to get

                                                                              geo-metadata (eg

                                                                              boundaries) for each swath

                                                                              tile

                                                                              Query this table to get the

                                                                              list of satellite scan times

                                                                              that cover a target tile

                                                                              Swath Source

                                                                              Data Storage

                                                                              bull Computational costs driven by data scale and need to run reduction multiple times

                                                                              bull Storage costs driven by data scale and 6 month project duration

                                                                              bull Small with respect to the people costs even at graduate student rates

                                                                              Reduction 1 Queue

                                                                              Source Metadata

                                                                              Request Queue

                                                                              Scientific Results Download

                                                                              Data Collection Stage

                                                                              Source Imagery Download Sites

                                                                              Reprojection Queue

                                                                              Reduction 2 Queue

                                                                              Download

                                                                              Queue

                                                                              Scientists

                                                                              Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                              400-500 GB

                                                                              60K files

                                                                              10 MBsec

                                                                              11 hours

                                                                              lt10 workers

                                                                              $50 upload

                                                                              $450 storage

                                                                              400 GB

                                                                              45K files

                                                                              3500 hours

                                                                              20-100

                                                                              workers

                                                                              5-7 GB

                                                                              55K files

                                                                              1800 hours

                                                                              20-100

                                                                              workers

                                                                              lt10 GB

                                                                              ~1K files

                                                                              1800 hours

                                                                              20-100

                                                                              workers

                                                                              $420 cpu

                                                                              $60 download

                                                                              $216 cpu

                                                                              $1 download

                                                                              $6 storage

                                                                              $216 cpu

                                                                              $2 download

                                                                              $9 storage

                                                                              AzureMODIS

                                                                              Service Web Role Portal

                                                                              Total $1420

                                                                              bull Clouds are the largest scale computer centers ever constructed and have the

                                                                              potential to be important to both large and small scale science problems

                                                                              bull Equally import they can increase participation in research providing needed

                                                                              resources to userscommunities without ready access

                                                                              bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                              support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                              latency applications do not perform optimally on clouds today

                                                                              bull Provide valuable fault tolerance and scalability abstractions

                                                                              bull Clouds as amplifier for familiar client tools and on premise compute

                                                                              bull Clouds services to support research provide considerable leverage for both

                                                                              individual researchers and entire communities of researchers

                                                                              • Day 2 - Azure as a PaaS
                                                                              • Day 2 - Applications

                                                                                Queues Their Unique Role in Building Reliable Scalable Applications

                                                                                bull Want roles that work closely together but are not bound together bull Tight coupling leads to brittleness

                                                                                bull This can aid in scaling and performance

                                                                                bull A queue can hold an unlimited number of messages bull Messages must be serializable as XML

                                                                                bull Limited to 8KB in size

                                                                                bull Commonly use the work ticket pattern

                                                                                bull Why not simply use a table

                                                                                Queue Terminology

                                                                                Message Lifecycle

                                                                                Queue

                                                                                Msg 1

                                                                                Msg 2

                                                                                Msg 3

                                                                                Msg 4

                                                                                Worker Role

                                                                                Worker Role

                                                                                PutMessage

                                                                                Web Role

                                                                                GetMessage (Timeout) RemoveMessage

                                                                                Msg 2 Msg 1

                                                                                Worker Role

                                                                                Msg 2

                                                                                POST httpmyaccountqueuecorewindowsnetmyqueuemessages

                                                                                HTTP11 200 OK Transfer-Encoding chunked Content-Type applicationxml Date Tue 09 Dec 2008 210430 GMT Server Nephos Queue Service Version 10 Microsoft-HTTPAPI20

                                                                                ltxml version=10 encoding=utf-8gt ltQueueMessagesListgt ltQueueMessagegt ltMessageIdgt5974b586-0df3-4e2d-ad0c-18e3892bfca2ltMessageIdgt ltInsertionTimegtMon 22 Sep 2008 232920 GMTltInsertionTimegt ltExpirationTimegtMon 29 Sep 2008 232920 GMTltExpirationTimegt ltPopReceiptgtYzQ4Yzg1MDIGM0MDFiZDAwYzEwltPopReceiptgt ltTimeNextVisiblegtTue 23 Sep 2008 052920GMTltTimeNextVisiblegt ltMessageTextgtPHRlc3Q+dGdGVzdD4=ltMessageTextgt ltQueueMessagegt ltQueueMessagesListgt

                                                                                DELETE httpmyaccountqueuecorewindowsnetmyqueuemessagesmessageidpopreceipt=YzQ4Yzg1MDIGM0MDFiZDAwYzEw

                                                                                Truncated Exponential Back Off Polling

                                                                                Consider a backoff polling approach Each empty poll

                                                                                increases interval by 2x

                                                                                A successful sets the interval back to 1

                                                                                44

                                                                                2 1

                                                                                1 1

                                                                                C1

                                                                                C2

                                                                                Removing Poison Messages

                                                                                1 1

                                                                                2 1

                                                                                3 4 0

                                                                                Producers Consumers

                                                                                P2

                                                                                P1

                                                                                3 0

                                                                                2 GetMessage(Q 30 s) msg 2

                                                                                1 GetMessage(Q 30 s) msg 1

                                                                                1 1

                                                                                2 1

                                                                                1 0

                                                                                2 0

                                                                                45

                                                                                C1

                                                                                C2

                                                                                Removing Poison Messages

                                                                                3 4 0

                                                                                Producers Consumers

                                                                                P2

                                                                                P1

                                                                                1 1

                                                                                2 1

                                                                                2 GetMessage(Q 30 s) msg 2 3 C2 consumed msg 2 4 DeleteMessage(Q msg 2) 7 GetMessage(Q 30 s) msg 1

                                                                                1 GetMessage(Q 30 s) msg 1 5 C1 crashed

                                                                                1 1

                                                                                2 1

                                                                                6 msg1 visible 30 s after Dequeue 3 0

                                                                                1 2

                                                                                1 1

                                                                                1 2

                                                                                46

                                                                                C1

                                                                                C2

                                                                                Removing Poison Messages

                                                                                3 4 0

                                                                                Producers Consumers

                                                                                P2

                                                                                P1

                                                                                1 2

                                                                                2 Dequeue(Q 30 sec) msg 2 3 C2 consumed msg 2 4 Delete(Q msg 2) 7 Dequeue(Q 30 sec) msg 1 8 C2 crashed

                                                                                1 Dequeue(Q 30 sec) msg 1 5 C1 crashed 10 C1 restarted 11 Dequeue(Q 30 sec) msg 1 12 DequeueCount gt 2 13 Delete (Q msg1) 1

                                                                                2

                                                                                6 msg1 visible 30s after Dequeue 9 msg1 visible 30s after Dequeue

                                                                                3 0

                                                                                1 3

                                                                                1 2

                                                                                1 3

                                                                                Queues Recap

                                                                                bullNo need to deal with failures Make message

                                                                                processing idempotent

                                                                                bullInvisible messages result in out of order Do not rely on order

                                                                                bullEnforce threshold on messagersquos dequeue count Use Dequeue count to remove

                                                                                poison messages

                                                                                bullMessages gt 8KB

                                                                                bullBatch messages

                                                                                bullGarbage collect orphaned blobs

                                                                                bullDynamically increasereduce workers

                                                                                Use blob to store message data with

                                                                                reference in message

                                                                                Use message count to scale

                                                                                bullNo need to deal with failures

                                                                                bullInvisible messages result in out of order

                                                                                bullEnforce threshold on messagersquos dequeue count

                                                                                bullDynamically increasereduce workers

                                                                                Windows Azure Storage Takeaways

                                                                                Blobs

                                                                                Drives

                                                                                Tables

                                                                                Queues

                                                                                httpblogsmsdncomwindowsazurestorage

                                                                                httpazurescopecloudappnet

                                                                                49

                                                                                A Quick Exercise

                                                                                hellipThen letrsquos look at some code and some tools

                                                                                50

                                                                                Code ndash AccountInformationcs public class AccountInformation private static string storageKey = ldquotHiSiSnOtMyKeY private static string accountName = jjstore private static StorageCredentialsAccountAndKey credentials internal static StorageCredentialsAccountAndKey Credentials get if (credentials == null) credentials = new StorageCredentialsAccountAndKey(accountName storageKey) return credentials

                                                                                51

                                                                                Code ndash BlobHelpercs public class BlobHelper private static string defaultContainerName = school private CloudBlobClient client = null private CloudBlobContainer container = null private void InitContainer() if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudBlobClient() container = clientGetContainerReference(defaultContainerName) containerCreateIfNotExist() BlobContainerPermissions permissions = containerGetPermissions() permissionsPublicAccess = BlobContainerPublicAccessTypeContainer containerSetPermissions(permissions)

                                                                                52

                                                                                Code ndash BlobHelpercs

                                                                                public void WriteFileToBlob(string filePath) if (client == null || container == null) InitContainer() FileInfo file = new FileInfo(filePath) CloudBlob blob = containerGetBlobReference(fileName) blobPropertiesContentType = GetContentType(fileExtension) blobUploadFile(fileFullName) Or if you want to write a string replace the last line with blobUploadText(someString) And make sure you set the content type to the appropriate MIME type (eg ldquotextplainrdquo)

                                                                                53

                                                                                Code ndash BlobHelpercs

                                                                                public string GetBlobText(string blobName) if (client == null || container == null) InitContainer() CloudBlob blob = containerGetBlobReference(blobName) try return blobDownloadText() catch (Exception) The blob probably does not exist or there is no connection available return null

                                                                                54

                                                                                Application Code - Blobs private void SaveToCloudButton_Click(object sender RoutedEventArgs e) StringBuilder buff = new StringBuilder() buffAppendLine(LastNameFirstNameEmailBirthdayNativeLanguageFavoriteIceCreamYearsInPhDGraduated) foreach (AttendeeEntity attendee in attendees) buffAppendLine(attendeeToCsvString()) blobHelperWriteStringToBlob(SummerSchoolAttendeestxt buffToString())

                                                                                The blob is now available at httpltAccountNamegtblobcorewindowsnetltContainerNamegtltBlobNamegt Or in this case httpjjstoreblobcorewindowsnetschoolSummerSchoolAttendeestxt

                                                                                55

                                                                                Code - TableEntities using MicrosoftWindowsAzureStorageClient public class AttendeeEntity TableServiceEntity public string FirstName get set public string LastName get set public string Email get set public DateTime Birthday get set public string FavoriteIceCream get set public int YearsInPhD get set public bool Graduated get set hellip

                                                                                56

                                                                                Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

                                                                                57

                                                                                Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

                                                                                58

                                                                                Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

                                                                                59

                                                                                Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

                                                                                60

                                                                                Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

                                                                                Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

                                                                                61

                                                                                Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

                                                                                62

                                                                                Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

                                                                                Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

                                                                                63

                                                                                Tools ndash Fiddler2

                                                                                Best Practices

                                                                                Picking the Right VM Size

                                                                                bull Having the correct VM size can make a big difference in costs

                                                                                bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                                                                                bull If you scale better than linear across cores larger VMs could save you money

                                                                                bull Pretty rare to see linear scaling across 8 cores

                                                                                bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                                                                                bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                                                                                Using Your VM to the Maximum

                                                                                Remember bull 1 role instance == 1 VM running Windows

                                                                                bull 1 role instance = one specific task for your code

                                                                                bull Yoursquore paying for the entire VM so why not use it

                                                                                bull Common mistake ndash split up code into multiple roles each not using up CPU

                                                                                bull Balance between using up CPU vs having free capacity in times of need

                                                                                bull Multiple ways to use your CPU to the fullest

                                                                                Exploiting Concurrency

                                                                                bull Spin up additional processes each with a specific task or as a unit of concurrency

                                                                                bull May not be ideal if number of active processes exceeds number of cores

                                                                                bull Use multithreading aggressively

                                                                                bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                                                                                bull In NET 4 use the Task Parallel Library

                                                                                bull Data parallelism

                                                                                bull Task parallelism

                                                                                Finding Good Code Neighbors

                                                                                bull Typically code falls into one or more of these categories

                                                                                bull Find code that is intensive with different resources to live together

                                                                                bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                                                                                Memory Intensive

                                                                                CPU Intensive

                                                                                Network IO Intensive

                                                                                Storage IO Intensive

                                                                                Scaling Appropriately

                                                                                bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                                                                                bull Spinning VMs up and down automatically is good at large scale

                                                                                bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                                                                                bull Being too aggressive in spinning down VMs can result in poor user experience

                                                                                bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                                                                                Performance Cost

                                                                                Storage Costs

                                                                                bull Understand an applicationrsquos storage profile and how storage billing works

                                                                                bull Make service choices based on your app profile

                                                                                bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                                                                                bull Service choice can make a big cost difference based on your app profile

                                                                                bull Caching and compressing They help a lot with storage costs

                                                                                Saving Bandwidth Costs

                                                                                Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                                                                Sending fewer things over the wire often means getting fewer things from storage

                                                                                Saving bandwidth costs often lead to savings in other places

                                                                                Sending fewer things means your VM has time to do other tasks

                                                                                All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                                                                Compressing Content

                                                                                1 Gzip all output content

                                                                                bull All modern browsers can decompress on the fly

                                                                                bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                                                                2Tradeoff compute costs for storage size

                                                                                3Minimize image sizes

                                                                                bull Use Portable Network Graphics (PNGs)

                                                                                bull Crush your PNGs

                                                                                bull Strip needless metadata

                                                                                bull Make all PNGs palette PNGs

                                                                                Uncompressed

                                                                                Content

                                                                                Compressed

                                                                                Content

                                                                                Gzip

                                                                                Minify JavaScript

                                                                                Minify CCS

                                                                                Minify Images

                                                                                Best Practices Summary

                                                                                Doing lsquolessrsquo is the key to saving costs

                                                                                Measure everything

                                                                                Know your application profile in and out

                                                                                Research Examples in the Cloud

                                                                                hellipon another set of slides

                                                                                Map Reduce on Azure

                                                                                bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                                                                bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                                                                76

                                                                                Project Daytona - Map Reduce on Azure

                                                                                httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                                                                77

                                                                                Questions and Discussionhellip

                                                                                Thank you for hosting me at the Summer School

                                                                                BLAST (Basic Local Alignment Search Tool)

                                                                                bull The most important software in bioinformatics

                                                                                bull Identify similarity between bio-sequences

                                                                                Computationally intensive

                                                                                bull Large number of pairwise alignment operations

                                                                                bull A BLAST running can take 700 ~ 1000 CPU hours

                                                                                bull Sequence databases growing exponentially

                                                                                bull GenBank doubled in size in about 15 months

                                                                                It is easy to parallelize BLAST

                                                                                bull Segment the input

                                                                                bull Segment processing (querying) is pleasingly parallel

                                                                                bull Segment the database (eg mpiBLAST)

                                                                                bull Needs special result reduction processing

                                                                                Large volume data

                                                                                bull A normal Blast database can be as large as 10GB

                                                                                bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                                                bull The output of BLAST is usually 10-100x larger than the input

                                                                                bull Parallel BLAST engine on Azure

                                                                                bull Query-segmentation data-parallel pattern bull split the input sequences

                                                                                bull query partitions in parallel

                                                                                bull merge results together when done

                                                                                bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                                                bull With three special considerations bull Batch job management

                                                                                bull Task parallelism on an elastic Cloud

                                                                                Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                                                on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                                                A simple SplitJoin pattern

                                                                                Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                                                bull 1248 for small middle large and extra large instance size

                                                                                Task granularity bull Large partition load imbalance

                                                                                bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                                                bull Data transferring overhead

                                                                                Best Practice test runs to profiling and set size to mitigate the overhead

                                                                                Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                                                bull too small repeated computation

                                                                                bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                                                bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                                                bull Watch out for the 2-hour maximum limitation

                                                                                BLAST task

                                                                                Splitting task

                                                                                BLAST task

                                                                                BLAST task

                                                                                BLAST task

                                                                                hellip

                                                                                Merging Task

                                                                                Task size vs Performance

                                                                                bull Benefit of the warm cache effect

                                                                                bull 100 sequences per partition is the best choice

                                                                                Instance size vs Performance

                                                                                bull Super-linear speedup with larger size worker instances

                                                                                bull Primarily due to the memory capability

                                                                                Task SizeInstance Size vs Cost

                                                                                bull Extra-large instance generated the best and the most economical throughput

                                                                                bull Fully utilize the resource

                                                                                Web

                                                                                Portal

                                                                                Web

                                                                                Service

                                                                                Job registration

                                                                                Job Scheduler

                                                                                Worker

                                                                                Worker

                                                                                Worker

                                                                                Global

                                                                                dispatch

                                                                                queue

                                                                                Web Role

                                                                                Azure Table

                                                                                Job Management Role

                                                                                Azure Blob

                                                                                Database

                                                                                updating Role

                                                                                hellip

                                                                                Scaling Engine

                                                                                Blast databases

                                                                                temporary data etc)

                                                                                Job Registry NCBI databases

                                                                                BLAST task

                                                                                Splitting task

                                                                                BLAST task

                                                                                BLAST task

                                                                                BLAST task

                                                                                hellip

                                                                                Merging Task

                                                                                ASPNET program hosted by a web role instance bull Submit jobs

                                                                                bull Track jobrsquos status and logs

                                                                                AuthenticationAuthorization based on Live ID

                                                                                The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                                                Web Portal

                                                                                Web Service

                                                                                Job registration

                                                                                Job Scheduler

                                                                                Job Portal

                                                                                Scaling Engine

                                                                                Job Registry

                                                                                R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                                                Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                                                bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                                                AzureBLAST significantly saved computing timehellip

                                                                                Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                                                ldquoAll against Allrdquo query bull The database is also the input query

                                                                                bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                                                bull Theoretically 100 billion sequence comparisons

                                                                                Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                                                bull Would require 3216731 minutes (61 years) on one desktop

                                                                                One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                                                bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                                bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                                bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                                bull Each segment consists of smaller partitions

                                                                                bull When load imbalances redistribute the load manually

                                                                                5

                                                                                0

                                                                                62 6

                                                                                2 6

                                                                                2 6

                                                                                2 6

                                                                                2 5

                                                                                0 62

                                                                                bull Total size of the output result is ~230GB

                                                                                bull The number of total hits is 1764579487

                                                                                bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                                bull But based our estimates real working instance time should be 6~8 day

                                                                                bull Look into log data to analyze what took placehellip

                                                                                5

                                                                                0

                                                                                62 6

                                                                                2 6

                                                                                2 6

                                                                                2 6

                                                                                2 5

                                                                                0 62

                                                                                A normal log record should be

                                                                                Otherwise something is wrong (eg task failed to complete)

                                                                                3312010 614 RD00155D3611B0 Executing the task 251523

                                                                                3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                                3312010 625 RD00155D3611B0 Executing the task 251553

                                                                                3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                                3312010 644 RD00155D3611B0 Executing the task 251600

                                                                                3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                                3312010 822 RD00155D3611B0 Executing the task 251774

                                                                                3312010 950 RD00155D3611B0 Executing the task 251895

                                                                                3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                                North Europe Data Center totally 34256 tasks processed

                                                                                All 62 compute nodes lost tasks

                                                                                and then came back in a group

                                                                                This is an Update domain

                                                                                ~30 mins

                                                                                ~ 6 nodes in one group

                                                                                35 Nodes experience blob

                                                                                writing failure at same

                                                                                time

                                                                                West Europe Datacenter 30976 tasks are completed and job was killed

                                                                                A reasonable guess the

                                                                                Fault Domain is working

                                                                                MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                                You never miss the water till the well has run dry Irish Proverb

                                                                                ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                                Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                                λv = Latent heat of vaporization (Jg)

                                                                                Rn = Net radiation (W m-2)

                                                                                cp = Specific heat capacity of air (J kg-1 K-1)

                                                                                ρa = dry air density (kg m-3)

                                                                                δq = vapor pressure deficit (Pa)

                                                                                ga = Conductivity of air (inverse of ra) (m s-1)

                                                                                gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                                γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                                Estimating resistanceconductivity across a

                                                                                catchment can be tricky

                                                                                bull Lots of inputs big data reduction

                                                                                bull Some of the inputs are not so simple

                                                                                119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                                (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                                Penman-Monteith (1964)

                                                                                Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                                NASA MODIS

                                                                                imagery source

                                                                                archives

                                                                                5 TB (600K files)

                                                                                FLUXNET curated

                                                                                sensor dataset

                                                                                (30GB 960 files)

                                                                                FLUXNET curated

                                                                                field dataset

                                                                                2 KB (1 file)

                                                                                NCEPNCAR

                                                                                ~100MB

                                                                                (4K files)

                                                                                Vegetative

                                                                                clumping

                                                                                ~5MB (1file)

                                                                                Climate

                                                                                classification

                                                                                ~1MB (1file)

                                                                                20 US year = 1 global year

                                                                                Data collection (map) stage

                                                                                bull Downloads requested input tiles

                                                                                from NASA ftp sites

                                                                                bull Includes geospatial lookup for

                                                                                non-sinusoidal tiles that will

                                                                                contribute to a reprojected

                                                                                sinusoidal tile

                                                                                Reprojection (map) stage

                                                                                bull Converts source tile(s) to

                                                                                intermediate result sinusoidal tiles

                                                                                bull Simple nearest neighbor or spline

                                                                                algorithms

                                                                                Derivation reduction stage

                                                                                bull First stage visible to scientist

                                                                                bull Computes ET in our initial use

                                                                                Analysis reduction stage

                                                                                bull Optional second stage visible to

                                                                                scientist

                                                                                bull Enables production of science

                                                                                analysis artifacts such as maps

                                                                                tables virtual sensors

                                                                                Reduction 1

                                                                                Queue

                                                                                Source

                                                                                Metadata

                                                                                AzureMODIS

                                                                                Service Web Role Portal

                                                                                Request

                                                                                Queue

                                                                                Scientific

                                                                                Results

                                                                                Download

                                                                                Data Collection Stage

                                                                                Source Imagery Download Sites

                                                                                Reprojection

                                                                                Queue

                                                                                Reduction 2

                                                                                Queue

                                                                                Download

                                                                                Queue

                                                                                Scientists

                                                                                Science results

                                                                                Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                recoverable units of work

                                                                                bull Execution status of all jobs and tasks persisted in Tables

                                                                                ltPipelineStagegt

                                                                                Request

                                                                                hellip ltPipelineStagegtJobStatus

                                                                                Persist ltPipelineStagegtJob Queue

                                                                                MODISAzure Service

                                                                                (Web Role)

                                                                                Service Monitor

                                                                                (Worker Role)

                                                                                Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                hellip

                                                                                Dispatch

                                                                                ltPipelineStagegtTask Queue

                                                                                All work actually done by a Worker Role

                                                                                bull Sandboxes science or other executable

                                                                                bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                Service Monitor

                                                                                (Worker Role)

                                                                                Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                GenericWorker

                                                                                (Worker Role)

                                                                                hellip

                                                                                hellip

                                                                                Dispatch

                                                                                ltPipelineStagegtTask Queue

                                                                                hellip

                                                                                ltInputgtData Storage

                                                                                bull Dequeues tasks created by the Service Monitor

                                                                                bull Retries failed tasks 3 times

                                                                                bull Maintains all task status

                                                                                Reprojection Request

                                                                                hellip

                                                                                Service Monitor

                                                                                (Worker Role)

                                                                                ReprojectionJobStatus Persist

                                                                                Parse amp Persist ReprojectionTaskStatus

                                                                                GenericWorker

                                                                                (Worker Role)

                                                                                hellip

                                                                                Job Queue

                                                                                hellip

                                                                                Dispatch

                                                                                Task Queue

                                                                                Points to

                                                                                hellip

                                                                                ScanTimeList

                                                                                SwathGranuleMeta

                                                                                Reprojection Data

                                                                                Storage

                                                                                Each entity specifies a

                                                                                single reprojection job

                                                                                request

                                                                                Each entity specifies a

                                                                                single reprojection task (ie

                                                                                a single tile)

                                                                                Query this table to get

                                                                                geo-metadata (eg

                                                                                boundaries) for each swath

                                                                                tile

                                                                                Query this table to get the

                                                                                list of satellite scan times

                                                                                that cover a target tile

                                                                                Swath Source

                                                                                Data Storage

                                                                                bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                bull Storage costs driven by data scale and 6 month project duration

                                                                                bull Small with respect to the people costs even at graduate student rates

                                                                                Reduction 1 Queue

                                                                                Source Metadata

                                                                                Request Queue

                                                                                Scientific Results Download

                                                                                Data Collection Stage

                                                                                Source Imagery Download Sites

                                                                                Reprojection Queue

                                                                                Reduction 2 Queue

                                                                                Download

                                                                                Queue

                                                                                Scientists

                                                                                Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                400-500 GB

                                                                                60K files

                                                                                10 MBsec

                                                                                11 hours

                                                                                lt10 workers

                                                                                $50 upload

                                                                                $450 storage

                                                                                400 GB

                                                                                45K files

                                                                                3500 hours

                                                                                20-100

                                                                                workers

                                                                                5-7 GB

                                                                                55K files

                                                                                1800 hours

                                                                                20-100

                                                                                workers

                                                                                lt10 GB

                                                                                ~1K files

                                                                                1800 hours

                                                                                20-100

                                                                                workers

                                                                                $420 cpu

                                                                                $60 download

                                                                                $216 cpu

                                                                                $1 download

                                                                                $6 storage

                                                                                $216 cpu

                                                                                $2 download

                                                                                $9 storage

                                                                                AzureMODIS

                                                                                Service Web Role Portal

                                                                                Total $1420

                                                                                bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                potential to be important to both large and small scale science problems

                                                                                bull Equally import they can increase participation in research providing needed

                                                                                resources to userscommunities without ready access

                                                                                bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                latency applications do not perform optimally on clouds today

                                                                                bull Provide valuable fault tolerance and scalability abstractions

                                                                                bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                bull Clouds services to support research provide considerable leverage for both

                                                                                individual researchers and entire communities of researchers

                                                                                • Day 2 - Azure as a PaaS
                                                                                • Day 2 - Applications

                                                                                  Queue Terminology

                                                                                  Message Lifecycle

                                                                                  Queue

                                                                                  Msg 1

                                                                                  Msg 2

                                                                                  Msg 3

                                                                                  Msg 4

                                                                                  Worker Role

                                                                                  Worker Role

                                                                                  PutMessage

                                                                                  Web Role

                                                                                  GetMessage (Timeout) RemoveMessage

                                                                                  Msg 2 Msg 1

                                                                                  Worker Role

                                                                                  Msg 2

                                                                                  POST httpmyaccountqueuecorewindowsnetmyqueuemessages

                                                                                  HTTP11 200 OK Transfer-Encoding chunked Content-Type applicationxml Date Tue 09 Dec 2008 210430 GMT Server Nephos Queue Service Version 10 Microsoft-HTTPAPI20

                                                                                  ltxml version=10 encoding=utf-8gt ltQueueMessagesListgt ltQueueMessagegt ltMessageIdgt5974b586-0df3-4e2d-ad0c-18e3892bfca2ltMessageIdgt ltInsertionTimegtMon 22 Sep 2008 232920 GMTltInsertionTimegt ltExpirationTimegtMon 29 Sep 2008 232920 GMTltExpirationTimegt ltPopReceiptgtYzQ4Yzg1MDIGM0MDFiZDAwYzEwltPopReceiptgt ltTimeNextVisiblegtTue 23 Sep 2008 052920GMTltTimeNextVisiblegt ltMessageTextgtPHRlc3Q+dGdGVzdD4=ltMessageTextgt ltQueueMessagegt ltQueueMessagesListgt

                                                                                  DELETE httpmyaccountqueuecorewindowsnetmyqueuemessagesmessageidpopreceipt=YzQ4Yzg1MDIGM0MDFiZDAwYzEw

                                                                                  Truncated Exponential Back Off Polling

                                                                                  Consider a backoff polling approach Each empty poll

                                                                                  increases interval by 2x

                                                                                  A successful sets the interval back to 1

                                                                                  44

                                                                                  2 1

                                                                                  1 1

                                                                                  C1

                                                                                  C2

                                                                                  Removing Poison Messages

                                                                                  1 1

                                                                                  2 1

                                                                                  3 4 0

                                                                                  Producers Consumers

                                                                                  P2

                                                                                  P1

                                                                                  3 0

                                                                                  2 GetMessage(Q 30 s) msg 2

                                                                                  1 GetMessage(Q 30 s) msg 1

                                                                                  1 1

                                                                                  2 1

                                                                                  1 0

                                                                                  2 0

                                                                                  45

                                                                                  C1

                                                                                  C2

                                                                                  Removing Poison Messages

                                                                                  3 4 0

                                                                                  Producers Consumers

                                                                                  P2

                                                                                  P1

                                                                                  1 1

                                                                                  2 1

                                                                                  2 GetMessage(Q 30 s) msg 2 3 C2 consumed msg 2 4 DeleteMessage(Q msg 2) 7 GetMessage(Q 30 s) msg 1

                                                                                  1 GetMessage(Q 30 s) msg 1 5 C1 crashed

                                                                                  1 1

                                                                                  2 1

                                                                                  6 msg1 visible 30 s after Dequeue 3 0

                                                                                  1 2

                                                                                  1 1

                                                                                  1 2

                                                                                  46

                                                                                  C1

                                                                                  C2

                                                                                  Removing Poison Messages

                                                                                  3 4 0

                                                                                  Producers Consumers

                                                                                  P2

                                                                                  P1

                                                                                  1 2

                                                                                  2 Dequeue(Q 30 sec) msg 2 3 C2 consumed msg 2 4 Delete(Q msg 2) 7 Dequeue(Q 30 sec) msg 1 8 C2 crashed

                                                                                  1 Dequeue(Q 30 sec) msg 1 5 C1 crashed 10 C1 restarted 11 Dequeue(Q 30 sec) msg 1 12 DequeueCount gt 2 13 Delete (Q msg1) 1

                                                                                  2

                                                                                  6 msg1 visible 30s after Dequeue 9 msg1 visible 30s after Dequeue

                                                                                  3 0

                                                                                  1 3

                                                                                  1 2

                                                                                  1 3

                                                                                  Queues Recap

                                                                                  bullNo need to deal with failures Make message

                                                                                  processing idempotent

                                                                                  bullInvisible messages result in out of order Do not rely on order

                                                                                  bullEnforce threshold on messagersquos dequeue count Use Dequeue count to remove

                                                                                  poison messages

                                                                                  bullMessages gt 8KB

                                                                                  bullBatch messages

                                                                                  bullGarbage collect orphaned blobs

                                                                                  bullDynamically increasereduce workers

                                                                                  Use blob to store message data with

                                                                                  reference in message

                                                                                  Use message count to scale

                                                                                  bullNo need to deal with failures

                                                                                  bullInvisible messages result in out of order

                                                                                  bullEnforce threshold on messagersquos dequeue count

                                                                                  bullDynamically increasereduce workers

                                                                                  Windows Azure Storage Takeaways

                                                                                  Blobs

                                                                                  Drives

                                                                                  Tables

                                                                                  Queues

                                                                                  httpblogsmsdncomwindowsazurestorage

                                                                                  httpazurescopecloudappnet

                                                                                  49

                                                                                  A Quick Exercise

                                                                                  hellipThen letrsquos look at some code and some tools

                                                                                  50

                                                                                  Code ndash AccountInformationcs public class AccountInformation private static string storageKey = ldquotHiSiSnOtMyKeY private static string accountName = jjstore private static StorageCredentialsAccountAndKey credentials internal static StorageCredentialsAccountAndKey Credentials get if (credentials == null) credentials = new StorageCredentialsAccountAndKey(accountName storageKey) return credentials

                                                                                  51

                                                                                  Code ndash BlobHelpercs public class BlobHelper private static string defaultContainerName = school private CloudBlobClient client = null private CloudBlobContainer container = null private void InitContainer() if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudBlobClient() container = clientGetContainerReference(defaultContainerName) containerCreateIfNotExist() BlobContainerPermissions permissions = containerGetPermissions() permissionsPublicAccess = BlobContainerPublicAccessTypeContainer containerSetPermissions(permissions)

                                                                                  52

                                                                                  Code ndash BlobHelpercs

                                                                                  public void WriteFileToBlob(string filePath) if (client == null || container == null) InitContainer() FileInfo file = new FileInfo(filePath) CloudBlob blob = containerGetBlobReference(fileName) blobPropertiesContentType = GetContentType(fileExtension) blobUploadFile(fileFullName) Or if you want to write a string replace the last line with blobUploadText(someString) And make sure you set the content type to the appropriate MIME type (eg ldquotextplainrdquo)

                                                                                  53

                                                                                  Code ndash BlobHelpercs

                                                                                  public string GetBlobText(string blobName) if (client == null || container == null) InitContainer() CloudBlob blob = containerGetBlobReference(blobName) try return blobDownloadText() catch (Exception) The blob probably does not exist or there is no connection available return null

                                                                                  54

                                                                                  Application Code - Blobs private void SaveToCloudButton_Click(object sender RoutedEventArgs e) StringBuilder buff = new StringBuilder() buffAppendLine(LastNameFirstNameEmailBirthdayNativeLanguageFavoriteIceCreamYearsInPhDGraduated) foreach (AttendeeEntity attendee in attendees) buffAppendLine(attendeeToCsvString()) blobHelperWriteStringToBlob(SummerSchoolAttendeestxt buffToString())

                                                                                  The blob is now available at httpltAccountNamegtblobcorewindowsnetltContainerNamegtltBlobNamegt Or in this case httpjjstoreblobcorewindowsnetschoolSummerSchoolAttendeestxt

                                                                                  55

                                                                                  Code - TableEntities using MicrosoftWindowsAzureStorageClient public class AttendeeEntity TableServiceEntity public string FirstName get set public string LastName get set public string Email get set public DateTime Birthday get set public string FavoriteIceCream get set public int YearsInPhD get set public bool Graduated get set hellip

                                                                                  56

                                                                                  Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

                                                                                  57

                                                                                  Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

                                                                                  58

                                                                                  Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

                                                                                  59

                                                                                  Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

                                                                                  60

                                                                                  Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

                                                                                  Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

                                                                                  61

                                                                                  Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

                                                                                  62

                                                                                  Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

                                                                                  Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

                                                                                  63

                                                                                  Tools ndash Fiddler2

                                                                                  Best Practices

                                                                                  Picking the Right VM Size

                                                                                  bull Having the correct VM size can make a big difference in costs

                                                                                  bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                                                                                  bull If you scale better than linear across cores larger VMs could save you money

                                                                                  bull Pretty rare to see linear scaling across 8 cores

                                                                                  bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                                                                                  bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                                                                                  Using Your VM to the Maximum

                                                                                  Remember bull 1 role instance == 1 VM running Windows

                                                                                  bull 1 role instance = one specific task for your code

                                                                                  bull Yoursquore paying for the entire VM so why not use it

                                                                                  bull Common mistake ndash split up code into multiple roles each not using up CPU

                                                                                  bull Balance between using up CPU vs having free capacity in times of need

                                                                                  bull Multiple ways to use your CPU to the fullest

                                                                                  Exploiting Concurrency

                                                                                  bull Spin up additional processes each with a specific task or as a unit of concurrency

                                                                                  bull May not be ideal if number of active processes exceeds number of cores

                                                                                  bull Use multithreading aggressively

                                                                                  bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                                                                                  bull In NET 4 use the Task Parallel Library

                                                                                  bull Data parallelism

                                                                                  bull Task parallelism

                                                                                  Finding Good Code Neighbors

                                                                                  bull Typically code falls into one or more of these categories

                                                                                  bull Find code that is intensive with different resources to live together

                                                                                  bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                                                                                  Memory Intensive

                                                                                  CPU Intensive

                                                                                  Network IO Intensive

                                                                                  Storage IO Intensive

                                                                                  Scaling Appropriately

                                                                                  bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                                                                                  bull Spinning VMs up and down automatically is good at large scale

                                                                                  bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                                                                                  bull Being too aggressive in spinning down VMs can result in poor user experience

                                                                                  bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                                                                                  Performance Cost

                                                                                  Storage Costs

                                                                                  bull Understand an applicationrsquos storage profile and how storage billing works

                                                                                  bull Make service choices based on your app profile

                                                                                  bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                                                                                  bull Service choice can make a big cost difference based on your app profile

                                                                                  bull Caching and compressing They help a lot with storage costs

                                                                                  Saving Bandwidth Costs

                                                                                  Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                                                                  Sending fewer things over the wire often means getting fewer things from storage

                                                                                  Saving bandwidth costs often lead to savings in other places

                                                                                  Sending fewer things means your VM has time to do other tasks

                                                                                  All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                                                                  Compressing Content

                                                                                  1 Gzip all output content

                                                                                  bull All modern browsers can decompress on the fly

                                                                                  bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                                                                  2Tradeoff compute costs for storage size

                                                                                  3Minimize image sizes

                                                                                  bull Use Portable Network Graphics (PNGs)

                                                                                  bull Crush your PNGs

                                                                                  bull Strip needless metadata

                                                                                  bull Make all PNGs palette PNGs

                                                                                  Uncompressed

                                                                                  Content

                                                                                  Compressed

                                                                                  Content

                                                                                  Gzip

                                                                                  Minify JavaScript

                                                                                  Minify CCS

                                                                                  Minify Images

                                                                                  Best Practices Summary

                                                                                  Doing lsquolessrsquo is the key to saving costs

                                                                                  Measure everything

                                                                                  Know your application profile in and out

                                                                                  Research Examples in the Cloud

                                                                                  hellipon another set of slides

                                                                                  Map Reduce on Azure

                                                                                  bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                                                                  bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                                                                  76

                                                                                  Project Daytona - Map Reduce on Azure

                                                                                  httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                                                                  77

                                                                                  Questions and Discussionhellip

                                                                                  Thank you for hosting me at the Summer School

                                                                                  BLAST (Basic Local Alignment Search Tool)

                                                                                  bull The most important software in bioinformatics

                                                                                  bull Identify similarity between bio-sequences

                                                                                  Computationally intensive

                                                                                  bull Large number of pairwise alignment operations

                                                                                  bull A BLAST running can take 700 ~ 1000 CPU hours

                                                                                  bull Sequence databases growing exponentially

                                                                                  bull GenBank doubled in size in about 15 months

                                                                                  It is easy to parallelize BLAST

                                                                                  bull Segment the input

                                                                                  bull Segment processing (querying) is pleasingly parallel

                                                                                  bull Segment the database (eg mpiBLAST)

                                                                                  bull Needs special result reduction processing

                                                                                  Large volume data

                                                                                  bull A normal Blast database can be as large as 10GB

                                                                                  bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                                                  bull The output of BLAST is usually 10-100x larger than the input

                                                                                  bull Parallel BLAST engine on Azure

                                                                                  bull Query-segmentation data-parallel pattern bull split the input sequences

                                                                                  bull query partitions in parallel

                                                                                  bull merge results together when done

                                                                                  bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                                                  bull With three special considerations bull Batch job management

                                                                                  bull Task parallelism on an elastic Cloud

                                                                                  Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                                                  on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                                                  A simple SplitJoin pattern

                                                                                  Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                                                  bull 1248 for small middle large and extra large instance size

                                                                                  Task granularity bull Large partition load imbalance

                                                                                  bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                                                  bull Data transferring overhead

                                                                                  Best Practice test runs to profiling and set size to mitigate the overhead

                                                                                  Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                                                  bull too small repeated computation

                                                                                  bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                                                  bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                                                  bull Watch out for the 2-hour maximum limitation

                                                                                  BLAST task

                                                                                  Splitting task

                                                                                  BLAST task

                                                                                  BLAST task

                                                                                  BLAST task

                                                                                  hellip

                                                                                  Merging Task

                                                                                  Task size vs Performance

                                                                                  bull Benefit of the warm cache effect

                                                                                  bull 100 sequences per partition is the best choice

                                                                                  Instance size vs Performance

                                                                                  bull Super-linear speedup with larger size worker instances

                                                                                  bull Primarily due to the memory capability

                                                                                  Task SizeInstance Size vs Cost

                                                                                  bull Extra-large instance generated the best and the most economical throughput

                                                                                  bull Fully utilize the resource

                                                                                  Web

                                                                                  Portal

                                                                                  Web

                                                                                  Service

                                                                                  Job registration

                                                                                  Job Scheduler

                                                                                  Worker

                                                                                  Worker

                                                                                  Worker

                                                                                  Global

                                                                                  dispatch

                                                                                  queue

                                                                                  Web Role

                                                                                  Azure Table

                                                                                  Job Management Role

                                                                                  Azure Blob

                                                                                  Database

                                                                                  updating Role

                                                                                  hellip

                                                                                  Scaling Engine

                                                                                  Blast databases

                                                                                  temporary data etc)

                                                                                  Job Registry NCBI databases

                                                                                  BLAST task

                                                                                  Splitting task

                                                                                  BLAST task

                                                                                  BLAST task

                                                                                  BLAST task

                                                                                  hellip

                                                                                  Merging Task

                                                                                  ASPNET program hosted by a web role instance bull Submit jobs

                                                                                  bull Track jobrsquos status and logs

                                                                                  AuthenticationAuthorization based on Live ID

                                                                                  The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                                                  Web Portal

                                                                                  Web Service

                                                                                  Job registration

                                                                                  Job Scheduler

                                                                                  Job Portal

                                                                                  Scaling Engine

                                                                                  Job Registry

                                                                                  R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                                                  Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                                                  bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                                                  AzureBLAST significantly saved computing timehellip

                                                                                  Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                                                  ldquoAll against Allrdquo query bull The database is also the input query

                                                                                  bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                                                  bull Theoretically 100 billion sequence comparisons

                                                                                  Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                                                  bull Would require 3216731 minutes (61 years) on one desktop

                                                                                  One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                                                  bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                                  bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                                  bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                                  bull Each segment consists of smaller partitions

                                                                                  bull When load imbalances redistribute the load manually

                                                                                  5

                                                                                  0

                                                                                  62 6

                                                                                  2 6

                                                                                  2 6

                                                                                  2 6

                                                                                  2 5

                                                                                  0 62

                                                                                  bull Total size of the output result is ~230GB

                                                                                  bull The number of total hits is 1764579487

                                                                                  bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                                  bull But based our estimates real working instance time should be 6~8 day

                                                                                  bull Look into log data to analyze what took placehellip

                                                                                  5

                                                                                  0

                                                                                  62 6

                                                                                  2 6

                                                                                  2 6

                                                                                  2 6

                                                                                  2 5

                                                                                  0 62

                                                                                  A normal log record should be

                                                                                  Otherwise something is wrong (eg task failed to complete)

                                                                                  3312010 614 RD00155D3611B0 Executing the task 251523

                                                                                  3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                                  3312010 625 RD00155D3611B0 Executing the task 251553

                                                                                  3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                                  3312010 644 RD00155D3611B0 Executing the task 251600

                                                                                  3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                                  3312010 822 RD00155D3611B0 Executing the task 251774

                                                                                  3312010 950 RD00155D3611B0 Executing the task 251895

                                                                                  3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                                  North Europe Data Center totally 34256 tasks processed

                                                                                  All 62 compute nodes lost tasks

                                                                                  and then came back in a group

                                                                                  This is an Update domain

                                                                                  ~30 mins

                                                                                  ~ 6 nodes in one group

                                                                                  35 Nodes experience blob

                                                                                  writing failure at same

                                                                                  time

                                                                                  West Europe Datacenter 30976 tasks are completed and job was killed

                                                                                  A reasonable guess the

                                                                                  Fault Domain is working

                                                                                  MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                                  You never miss the water till the well has run dry Irish Proverb

                                                                                  ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                                  Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                                  λv = Latent heat of vaporization (Jg)

                                                                                  Rn = Net radiation (W m-2)

                                                                                  cp = Specific heat capacity of air (J kg-1 K-1)

                                                                                  ρa = dry air density (kg m-3)

                                                                                  δq = vapor pressure deficit (Pa)

                                                                                  ga = Conductivity of air (inverse of ra) (m s-1)

                                                                                  gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                                  γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                                  Estimating resistanceconductivity across a

                                                                                  catchment can be tricky

                                                                                  bull Lots of inputs big data reduction

                                                                                  bull Some of the inputs are not so simple

                                                                                  119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                                  (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                                  Penman-Monteith (1964)

                                                                                  Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                                  NASA MODIS

                                                                                  imagery source

                                                                                  archives

                                                                                  5 TB (600K files)

                                                                                  FLUXNET curated

                                                                                  sensor dataset

                                                                                  (30GB 960 files)

                                                                                  FLUXNET curated

                                                                                  field dataset

                                                                                  2 KB (1 file)

                                                                                  NCEPNCAR

                                                                                  ~100MB

                                                                                  (4K files)

                                                                                  Vegetative

                                                                                  clumping

                                                                                  ~5MB (1file)

                                                                                  Climate

                                                                                  classification

                                                                                  ~1MB (1file)

                                                                                  20 US year = 1 global year

                                                                                  Data collection (map) stage

                                                                                  bull Downloads requested input tiles

                                                                                  from NASA ftp sites

                                                                                  bull Includes geospatial lookup for

                                                                                  non-sinusoidal tiles that will

                                                                                  contribute to a reprojected

                                                                                  sinusoidal tile

                                                                                  Reprojection (map) stage

                                                                                  bull Converts source tile(s) to

                                                                                  intermediate result sinusoidal tiles

                                                                                  bull Simple nearest neighbor or spline

                                                                                  algorithms

                                                                                  Derivation reduction stage

                                                                                  bull First stage visible to scientist

                                                                                  bull Computes ET in our initial use

                                                                                  Analysis reduction stage

                                                                                  bull Optional second stage visible to

                                                                                  scientist

                                                                                  bull Enables production of science

                                                                                  analysis artifacts such as maps

                                                                                  tables virtual sensors

                                                                                  Reduction 1

                                                                                  Queue

                                                                                  Source

                                                                                  Metadata

                                                                                  AzureMODIS

                                                                                  Service Web Role Portal

                                                                                  Request

                                                                                  Queue

                                                                                  Scientific

                                                                                  Results

                                                                                  Download

                                                                                  Data Collection Stage

                                                                                  Source Imagery Download Sites

                                                                                  Reprojection

                                                                                  Queue

                                                                                  Reduction 2

                                                                                  Queue

                                                                                  Download

                                                                                  Queue

                                                                                  Scientists

                                                                                  Science results

                                                                                  Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                  httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                  bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                  bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                  bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                  recoverable units of work

                                                                                  bull Execution status of all jobs and tasks persisted in Tables

                                                                                  ltPipelineStagegt

                                                                                  Request

                                                                                  hellip ltPipelineStagegtJobStatus

                                                                                  Persist ltPipelineStagegtJob Queue

                                                                                  MODISAzure Service

                                                                                  (Web Role)

                                                                                  Service Monitor

                                                                                  (Worker Role)

                                                                                  Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                  hellip

                                                                                  Dispatch

                                                                                  ltPipelineStagegtTask Queue

                                                                                  All work actually done by a Worker Role

                                                                                  bull Sandboxes science or other executable

                                                                                  bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                  Service Monitor

                                                                                  (Worker Role)

                                                                                  Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                  GenericWorker

                                                                                  (Worker Role)

                                                                                  hellip

                                                                                  hellip

                                                                                  Dispatch

                                                                                  ltPipelineStagegtTask Queue

                                                                                  hellip

                                                                                  ltInputgtData Storage

                                                                                  bull Dequeues tasks created by the Service Monitor

                                                                                  bull Retries failed tasks 3 times

                                                                                  bull Maintains all task status

                                                                                  Reprojection Request

                                                                                  hellip

                                                                                  Service Monitor

                                                                                  (Worker Role)

                                                                                  ReprojectionJobStatus Persist

                                                                                  Parse amp Persist ReprojectionTaskStatus

                                                                                  GenericWorker

                                                                                  (Worker Role)

                                                                                  hellip

                                                                                  Job Queue

                                                                                  hellip

                                                                                  Dispatch

                                                                                  Task Queue

                                                                                  Points to

                                                                                  hellip

                                                                                  ScanTimeList

                                                                                  SwathGranuleMeta

                                                                                  Reprojection Data

                                                                                  Storage

                                                                                  Each entity specifies a

                                                                                  single reprojection job

                                                                                  request

                                                                                  Each entity specifies a

                                                                                  single reprojection task (ie

                                                                                  a single tile)

                                                                                  Query this table to get

                                                                                  geo-metadata (eg

                                                                                  boundaries) for each swath

                                                                                  tile

                                                                                  Query this table to get the

                                                                                  list of satellite scan times

                                                                                  that cover a target tile

                                                                                  Swath Source

                                                                                  Data Storage

                                                                                  bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                  bull Storage costs driven by data scale and 6 month project duration

                                                                                  bull Small with respect to the people costs even at graduate student rates

                                                                                  Reduction 1 Queue

                                                                                  Source Metadata

                                                                                  Request Queue

                                                                                  Scientific Results Download

                                                                                  Data Collection Stage

                                                                                  Source Imagery Download Sites

                                                                                  Reprojection Queue

                                                                                  Reduction 2 Queue

                                                                                  Download

                                                                                  Queue

                                                                                  Scientists

                                                                                  Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                  400-500 GB

                                                                                  60K files

                                                                                  10 MBsec

                                                                                  11 hours

                                                                                  lt10 workers

                                                                                  $50 upload

                                                                                  $450 storage

                                                                                  400 GB

                                                                                  45K files

                                                                                  3500 hours

                                                                                  20-100

                                                                                  workers

                                                                                  5-7 GB

                                                                                  55K files

                                                                                  1800 hours

                                                                                  20-100

                                                                                  workers

                                                                                  lt10 GB

                                                                                  ~1K files

                                                                                  1800 hours

                                                                                  20-100

                                                                                  workers

                                                                                  $420 cpu

                                                                                  $60 download

                                                                                  $216 cpu

                                                                                  $1 download

                                                                                  $6 storage

                                                                                  $216 cpu

                                                                                  $2 download

                                                                                  $9 storage

                                                                                  AzureMODIS

                                                                                  Service Web Role Portal

                                                                                  Total $1420

                                                                                  bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                  potential to be important to both large and small scale science problems

                                                                                  bull Equally import they can increase participation in research providing needed

                                                                                  resources to userscommunities without ready access

                                                                                  bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                  support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                  latency applications do not perform optimally on clouds today

                                                                                  bull Provide valuable fault tolerance and scalability abstractions

                                                                                  bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                  bull Clouds services to support research provide considerable leverage for both

                                                                                  individual researchers and entire communities of researchers

                                                                                  • Day 2 - Azure as a PaaS
                                                                                  • Day 2 - Applications

                                                                                    Message Lifecycle

                                                                                    Queue

                                                                                    Msg 1

                                                                                    Msg 2

                                                                                    Msg 3

                                                                                    Msg 4

                                                                                    Worker Role

                                                                                    Worker Role

                                                                                    PutMessage

                                                                                    Web Role

                                                                                    GetMessage (Timeout) RemoveMessage

                                                                                    Msg 2 Msg 1

                                                                                    Worker Role

                                                                                    Msg 2

                                                                                    POST httpmyaccountqueuecorewindowsnetmyqueuemessages

                                                                                    HTTP11 200 OK Transfer-Encoding chunked Content-Type applicationxml Date Tue 09 Dec 2008 210430 GMT Server Nephos Queue Service Version 10 Microsoft-HTTPAPI20

                                                                                    ltxml version=10 encoding=utf-8gt ltQueueMessagesListgt ltQueueMessagegt ltMessageIdgt5974b586-0df3-4e2d-ad0c-18e3892bfca2ltMessageIdgt ltInsertionTimegtMon 22 Sep 2008 232920 GMTltInsertionTimegt ltExpirationTimegtMon 29 Sep 2008 232920 GMTltExpirationTimegt ltPopReceiptgtYzQ4Yzg1MDIGM0MDFiZDAwYzEwltPopReceiptgt ltTimeNextVisiblegtTue 23 Sep 2008 052920GMTltTimeNextVisiblegt ltMessageTextgtPHRlc3Q+dGdGVzdD4=ltMessageTextgt ltQueueMessagegt ltQueueMessagesListgt

                                                                                    DELETE httpmyaccountqueuecorewindowsnetmyqueuemessagesmessageidpopreceipt=YzQ4Yzg1MDIGM0MDFiZDAwYzEw

                                                                                    Truncated Exponential Back Off Polling

                                                                                    Consider a backoff polling approach Each empty poll

                                                                                    increases interval by 2x

                                                                                    A successful sets the interval back to 1

                                                                                    44

                                                                                    2 1

                                                                                    1 1

                                                                                    C1

                                                                                    C2

                                                                                    Removing Poison Messages

                                                                                    1 1

                                                                                    2 1

                                                                                    3 4 0

                                                                                    Producers Consumers

                                                                                    P2

                                                                                    P1

                                                                                    3 0

                                                                                    2 GetMessage(Q 30 s) msg 2

                                                                                    1 GetMessage(Q 30 s) msg 1

                                                                                    1 1

                                                                                    2 1

                                                                                    1 0

                                                                                    2 0

                                                                                    45

                                                                                    C1

                                                                                    C2

                                                                                    Removing Poison Messages

                                                                                    3 4 0

                                                                                    Producers Consumers

                                                                                    P2

                                                                                    P1

                                                                                    1 1

                                                                                    2 1

                                                                                    2 GetMessage(Q 30 s) msg 2 3 C2 consumed msg 2 4 DeleteMessage(Q msg 2) 7 GetMessage(Q 30 s) msg 1

                                                                                    1 GetMessage(Q 30 s) msg 1 5 C1 crashed

                                                                                    1 1

                                                                                    2 1

                                                                                    6 msg1 visible 30 s after Dequeue 3 0

                                                                                    1 2

                                                                                    1 1

                                                                                    1 2

                                                                                    46

                                                                                    C1

                                                                                    C2

                                                                                    Removing Poison Messages

                                                                                    3 4 0

                                                                                    Producers Consumers

                                                                                    P2

                                                                                    P1

                                                                                    1 2

                                                                                    2 Dequeue(Q 30 sec) msg 2 3 C2 consumed msg 2 4 Delete(Q msg 2) 7 Dequeue(Q 30 sec) msg 1 8 C2 crashed

                                                                                    1 Dequeue(Q 30 sec) msg 1 5 C1 crashed 10 C1 restarted 11 Dequeue(Q 30 sec) msg 1 12 DequeueCount gt 2 13 Delete (Q msg1) 1

                                                                                    2

                                                                                    6 msg1 visible 30s after Dequeue 9 msg1 visible 30s after Dequeue

                                                                                    3 0

                                                                                    1 3

                                                                                    1 2

                                                                                    1 3

                                                                                    Queues Recap

                                                                                    bullNo need to deal with failures Make message

                                                                                    processing idempotent

                                                                                    bullInvisible messages result in out of order Do not rely on order

                                                                                    bullEnforce threshold on messagersquos dequeue count Use Dequeue count to remove

                                                                                    poison messages

                                                                                    bullMessages gt 8KB

                                                                                    bullBatch messages

                                                                                    bullGarbage collect orphaned blobs

                                                                                    bullDynamically increasereduce workers

                                                                                    Use blob to store message data with

                                                                                    reference in message

                                                                                    Use message count to scale

                                                                                    bullNo need to deal with failures

                                                                                    bullInvisible messages result in out of order

                                                                                    bullEnforce threshold on messagersquos dequeue count

                                                                                    bullDynamically increasereduce workers

                                                                                    Windows Azure Storage Takeaways

                                                                                    Blobs

                                                                                    Drives

                                                                                    Tables

                                                                                    Queues

                                                                                    httpblogsmsdncomwindowsazurestorage

                                                                                    httpazurescopecloudappnet

                                                                                    49

                                                                                    A Quick Exercise

                                                                                    hellipThen letrsquos look at some code and some tools

                                                                                    50

                                                                                    Code ndash AccountInformationcs public class AccountInformation private static string storageKey = ldquotHiSiSnOtMyKeY private static string accountName = jjstore private static StorageCredentialsAccountAndKey credentials internal static StorageCredentialsAccountAndKey Credentials get if (credentials == null) credentials = new StorageCredentialsAccountAndKey(accountName storageKey) return credentials

                                                                                    51

                                                                                    Code ndash BlobHelpercs public class BlobHelper private static string defaultContainerName = school private CloudBlobClient client = null private CloudBlobContainer container = null private void InitContainer() if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudBlobClient() container = clientGetContainerReference(defaultContainerName) containerCreateIfNotExist() BlobContainerPermissions permissions = containerGetPermissions() permissionsPublicAccess = BlobContainerPublicAccessTypeContainer containerSetPermissions(permissions)

                                                                                    52

                                                                                    Code ndash BlobHelpercs

                                                                                    public void WriteFileToBlob(string filePath) if (client == null || container == null) InitContainer() FileInfo file = new FileInfo(filePath) CloudBlob blob = containerGetBlobReference(fileName) blobPropertiesContentType = GetContentType(fileExtension) blobUploadFile(fileFullName) Or if you want to write a string replace the last line with blobUploadText(someString) And make sure you set the content type to the appropriate MIME type (eg ldquotextplainrdquo)

                                                                                    53

                                                                                    Code ndash BlobHelpercs

                                                                                    public string GetBlobText(string blobName) if (client == null || container == null) InitContainer() CloudBlob blob = containerGetBlobReference(blobName) try return blobDownloadText() catch (Exception) The blob probably does not exist or there is no connection available return null

                                                                                    54

                                                                                    Application Code - Blobs private void SaveToCloudButton_Click(object sender RoutedEventArgs e) StringBuilder buff = new StringBuilder() buffAppendLine(LastNameFirstNameEmailBirthdayNativeLanguageFavoriteIceCreamYearsInPhDGraduated) foreach (AttendeeEntity attendee in attendees) buffAppendLine(attendeeToCsvString()) blobHelperWriteStringToBlob(SummerSchoolAttendeestxt buffToString())

                                                                                    The blob is now available at httpltAccountNamegtblobcorewindowsnetltContainerNamegtltBlobNamegt Or in this case httpjjstoreblobcorewindowsnetschoolSummerSchoolAttendeestxt

                                                                                    55

                                                                                    Code - TableEntities using MicrosoftWindowsAzureStorageClient public class AttendeeEntity TableServiceEntity public string FirstName get set public string LastName get set public string Email get set public DateTime Birthday get set public string FavoriteIceCream get set public int YearsInPhD get set public bool Graduated get set hellip

                                                                                    56

                                                                                    Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

                                                                                    57

                                                                                    Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

                                                                                    58

                                                                                    Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

                                                                                    59

                                                                                    Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

                                                                                    60

                                                                                    Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

                                                                                    Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

                                                                                    61

                                                                                    Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

                                                                                    62

                                                                                    Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

                                                                                    Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

                                                                                    63

                                                                                    Tools ndash Fiddler2

                                                                                    Best Practices

                                                                                    Picking the Right VM Size

                                                                                    bull Having the correct VM size can make a big difference in costs

                                                                                    bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                                                                                    bull If you scale better than linear across cores larger VMs could save you money

                                                                                    bull Pretty rare to see linear scaling across 8 cores

                                                                                    bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                                                                                    bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                                                                                    Using Your VM to the Maximum

                                                                                    Remember bull 1 role instance == 1 VM running Windows

                                                                                    bull 1 role instance = one specific task for your code

                                                                                    bull Yoursquore paying for the entire VM so why not use it

                                                                                    bull Common mistake ndash split up code into multiple roles each not using up CPU

                                                                                    bull Balance between using up CPU vs having free capacity in times of need

                                                                                    bull Multiple ways to use your CPU to the fullest

                                                                                    Exploiting Concurrency

                                                                                    bull Spin up additional processes each with a specific task or as a unit of concurrency

                                                                                    bull May not be ideal if number of active processes exceeds number of cores

                                                                                    bull Use multithreading aggressively

                                                                                    bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                                                                                    bull In NET 4 use the Task Parallel Library

                                                                                    bull Data parallelism

                                                                                    bull Task parallelism

                                                                                    Finding Good Code Neighbors

                                                                                    bull Typically code falls into one or more of these categories

                                                                                    bull Find code that is intensive with different resources to live together

                                                                                    bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                                                                                    Memory Intensive

                                                                                    CPU Intensive

                                                                                    Network IO Intensive

                                                                                    Storage IO Intensive

                                                                                    Scaling Appropriately

                                                                                    bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                                                                                    bull Spinning VMs up and down automatically is good at large scale

                                                                                    bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                                                                                    bull Being too aggressive in spinning down VMs can result in poor user experience

                                                                                    bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                                                                                    Performance Cost

                                                                                    Storage Costs

                                                                                    bull Understand an applicationrsquos storage profile and how storage billing works

                                                                                    bull Make service choices based on your app profile

                                                                                    bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                                                                                    bull Service choice can make a big cost difference based on your app profile

                                                                                    bull Caching and compressing They help a lot with storage costs

                                                                                    Saving Bandwidth Costs

                                                                                    Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                                                                    Sending fewer things over the wire often means getting fewer things from storage

                                                                                    Saving bandwidth costs often lead to savings in other places

                                                                                    Sending fewer things means your VM has time to do other tasks

                                                                                    All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                                                                    Compressing Content

                                                                                    1 Gzip all output content

                                                                                    bull All modern browsers can decompress on the fly

                                                                                    bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                                                                    2Tradeoff compute costs for storage size

                                                                                    3Minimize image sizes

                                                                                    bull Use Portable Network Graphics (PNGs)

                                                                                    bull Crush your PNGs

                                                                                    bull Strip needless metadata

                                                                                    bull Make all PNGs palette PNGs

                                                                                    Uncompressed

                                                                                    Content

                                                                                    Compressed

                                                                                    Content

                                                                                    Gzip

                                                                                    Minify JavaScript

                                                                                    Minify CCS

                                                                                    Minify Images

                                                                                    Best Practices Summary

                                                                                    Doing lsquolessrsquo is the key to saving costs

                                                                                    Measure everything

                                                                                    Know your application profile in and out

                                                                                    Research Examples in the Cloud

                                                                                    hellipon another set of slides

                                                                                    Map Reduce on Azure

                                                                                    bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                                                                    bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                                                                    76

                                                                                    Project Daytona - Map Reduce on Azure

                                                                                    httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                                                                    77

                                                                                    Questions and Discussionhellip

                                                                                    Thank you for hosting me at the Summer School

                                                                                    BLAST (Basic Local Alignment Search Tool)

                                                                                    bull The most important software in bioinformatics

                                                                                    bull Identify similarity between bio-sequences

                                                                                    Computationally intensive

                                                                                    bull Large number of pairwise alignment operations

                                                                                    bull A BLAST running can take 700 ~ 1000 CPU hours

                                                                                    bull Sequence databases growing exponentially

                                                                                    bull GenBank doubled in size in about 15 months

                                                                                    It is easy to parallelize BLAST

                                                                                    bull Segment the input

                                                                                    bull Segment processing (querying) is pleasingly parallel

                                                                                    bull Segment the database (eg mpiBLAST)

                                                                                    bull Needs special result reduction processing

                                                                                    Large volume data

                                                                                    bull A normal Blast database can be as large as 10GB

                                                                                    bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                                                    bull The output of BLAST is usually 10-100x larger than the input

                                                                                    bull Parallel BLAST engine on Azure

                                                                                    bull Query-segmentation data-parallel pattern bull split the input sequences

                                                                                    bull query partitions in parallel

                                                                                    bull merge results together when done

                                                                                    bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                                                    bull With three special considerations bull Batch job management

                                                                                    bull Task parallelism on an elastic Cloud

                                                                                    Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                                                    on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                                                    A simple SplitJoin pattern

                                                                                    Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                                                    bull 1248 for small middle large and extra large instance size

                                                                                    Task granularity bull Large partition load imbalance

                                                                                    bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                                                    bull Data transferring overhead

                                                                                    Best Practice test runs to profiling and set size to mitigate the overhead

                                                                                    Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                                                    bull too small repeated computation

                                                                                    bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                                                    bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                                                    bull Watch out for the 2-hour maximum limitation

                                                                                    BLAST task

                                                                                    Splitting task

                                                                                    BLAST task

                                                                                    BLAST task

                                                                                    BLAST task

                                                                                    hellip

                                                                                    Merging Task

                                                                                    Task size vs Performance

                                                                                    bull Benefit of the warm cache effect

                                                                                    bull 100 sequences per partition is the best choice

                                                                                    Instance size vs Performance

                                                                                    bull Super-linear speedup with larger size worker instances

                                                                                    bull Primarily due to the memory capability

                                                                                    Task SizeInstance Size vs Cost

                                                                                    bull Extra-large instance generated the best and the most economical throughput

                                                                                    bull Fully utilize the resource

                                                                                    Web

                                                                                    Portal

                                                                                    Web

                                                                                    Service

                                                                                    Job registration

                                                                                    Job Scheduler

                                                                                    Worker

                                                                                    Worker

                                                                                    Worker

                                                                                    Global

                                                                                    dispatch

                                                                                    queue

                                                                                    Web Role

                                                                                    Azure Table

                                                                                    Job Management Role

                                                                                    Azure Blob

                                                                                    Database

                                                                                    updating Role

                                                                                    hellip

                                                                                    Scaling Engine

                                                                                    Blast databases

                                                                                    temporary data etc)

                                                                                    Job Registry NCBI databases

                                                                                    BLAST task

                                                                                    Splitting task

                                                                                    BLAST task

                                                                                    BLAST task

                                                                                    BLAST task

                                                                                    hellip

                                                                                    Merging Task

                                                                                    ASPNET program hosted by a web role instance bull Submit jobs

                                                                                    bull Track jobrsquos status and logs

                                                                                    AuthenticationAuthorization based on Live ID

                                                                                    The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                                                    Web Portal

                                                                                    Web Service

                                                                                    Job registration

                                                                                    Job Scheduler

                                                                                    Job Portal

                                                                                    Scaling Engine

                                                                                    Job Registry

                                                                                    R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                                                    Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                                                    bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                                                    AzureBLAST significantly saved computing timehellip

                                                                                    Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                                                    ldquoAll against Allrdquo query bull The database is also the input query

                                                                                    bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                                                    bull Theoretically 100 billion sequence comparisons

                                                                                    Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                                                    bull Would require 3216731 minutes (61 years) on one desktop

                                                                                    One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                                                    bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                                    bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                                    bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                                    bull Each segment consists of smaller partitions

                                                                                    bull When load imbalances redistribute the load manually

                                                                                    5

                                                                                    0

                                                                                    62 6

                                                                                    2 6

                                                                                    2 6

                                                                                    2 6

                                                                                    2 5

                                                                                    0 62

                                                                                    bull Total size of the output result is ~230GB

                                                                                    bull The number of total hits is 1764579487

                                                                                    bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                                    bull But based our estimates real working instance time should be 6~8 day

                                                                                    bull Look into log data to analyze what took placehellip

                                                                                    5

                                                                                    0

                                                                                    62 6

                                                                                    2 6

                                                                                    2 6

                                                                                    2 6

                                                                                    2 5

                                                                                    0 62

                                                                                    A normal log record should be

                                                                                    Otherwise something is wrong (eg task failed to complete)

                                                                                    3312010 614 RD00155D3611B0 Executing the task 251523

                                                                                    3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                                    3312010 625 RD00155D3611B0 Executing the task 251553

                                                                                    3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                                    3312010 644 RD00155D3611B0 Executing the task 251600

                                                                                    3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                                    3312010 822 RD00155D3611B0 Executing the task 251774

                                                                                    3312010 950 RD00155D3611B0 Executing the task 251895

                                                                                    3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                                    North Europe Data Center totally 34256 tasks processed

                                                                                    All 62 compute nodes lost tasks

                                                                                    and then came back in a group

                                                                                    This is an Update domain

                                                                                    ~30 mins

                                                                                    ~ 6 nodes in one group

                                                                                    35 Nodes experience blob

                                                                                    writing failure at same

                                                                                    time

                                                                                    West Europe Datacenter 30976 tasks are completed and job was killed

                                                                                    A reasonable guess the

                                                                                    Fault Domain is working

                                                                                    MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                                    You never miss the water till the well has run dry Irish Proverb

                                                                                    ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                                    Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                                    λv = Latent heat of vaporization (Jg)

                                                                                    Rn = Net radiation (W m-2)

                                                                                    cp = Specific heat capacity of air (J kg-1 K-1)

                                                                                    ρa = dry air density (kg m-3)

                                                                                    δq = vapor pressure deficit (Pa)

                                                                                    ga = Conductivity of air (inverse of ra) (m s-1)

                                                                                    gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                                    γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                                    Estimating resistanceconductivity across a

                                                                                    catchment can be tricky

                                                                                    bull Lots of inputs big data reduction

                                                                                    bull Some of the inputs are not so simple

                                                                                    119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                                    (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                                    Penman-Monteith (1964)

                                                                                    Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                                    NASA MODIS

                                                                                    imagery source

                                                                                    archives

                                                                                    5 TB (600K files)

                                                                                    FLUXNET curated

                                                                                    sensor dataset

                                                                                    (30GB 960 files)

                                                                                    FLUXNET curated

                                                                                    field dataset

                                                                                    2 KB (1 file)

                                                                                    NCEPNCAR

                                                                                    ~100MB

                                                                                    (4K files)

                                                                                    Vegetative

                                                                                    clumping

                                                                                    ~5MB (1file)

                                                                                    Climate

                                                                                    classification

                                                                                    ~1MB (1file)

                                                                                    20 US year = 1 global year

                                                                                    Data collection (map) stage

                                                                                    bull Downloads requested input tiles

                                                                                    from NASA ftp sites

                                                                                    bull Includes geospatial lookup for

                                                                                    non-sinusoidal tiles that will

                                                                                    contribute to a reprojected

                                                                                    sinusoidal tile

                                                                                    Reprojection (map) stage

                                                                                    bull Converts source tile(s) to

                                                                                    intermediate result sinusoidal tiles

                                                                                    bull Simple nearest neighbor or spline

                                                                                    algorithms

                                                                                    Derivation reduction stage

                                                                                    bull First stage visible to scientist

                                                                                    bull Computes ET in our initial use

                                                                                    Analysis reduction stage

                                                                                    bull Optional second stage visible to

                                                                                    scientist

                                                                                    bull Enables production of science

                                                                                    analysis artifacts such as maps

                                                                                    tables virtual sensors

                                                                                    Reduction 1

                                                                                    Queue

                                                                                    Source

                                                                                    Metadata

                                                                                    AzureMODIS

                                                                                    Service Web Role Portal

                                                                                    Request

                                                                                    Queue

                                                                                    Scientific

                                                                                    Results

                                                                                    Download

                                                                                    Data Collection Stage

                                                                                    Source Imagery Download Sites

                                                                                    Reprojection

                                                                                    Queue

                                                                                    Reduction 2

                                                                                    Queue

                                                                                    Download

                                                                                    Queue

                                                                                    Scientists

                                                                                    Science results

                                                                                    Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                    httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                    bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                    bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                    bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                    recoverable units of work

                                                                                    bull Execution status of all jobs and tasks persisted in Tables

                                                                                    ltPipelineStagegt

                                                                                    Request

                                                                                    hellip ltPipelineStagegtJobStatus

                                                                                    Persist ltPipelineStagegtJob Queue

                                                                                    MODISAzure Service

                                                                                    (Web Role)

                                                                                    Service Monitor

                                                                                    (Worker Role)

                                                                                    Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                    hellip

                                                                                    Dispatch

                                                                                    ltPipelineStagegtTask Queue

                                                                                    All work actually done by a Worker Role

                                                                                    bull Sandboxes science or other executable

                                                                                    bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                    Service Monitor

                                                                                    (Worker Role)

                                                                                    Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                    GenericWorker

                                                                                    (Worker Role)

                                                                                    hellip

                                                                                    hellip

                                                                                    Dispatch

                                                                                    ltPipelineStagegtTask Queue

                                                                                    hellip

                                                                                    ltInputgtData Storage

                                                                                    bull Dequeues tasks created by the Service Monitor

                                                                                    bull Retries failed tasks 3 times

                                                                                    bull Maintains all task status

                                                                                    Reprojection Request

                                                                                    hellip

                                                                                    Service Monitor

                                                                                    (Worker Role)

                                                                                    ReprojectionJobStatus Persist

                                                                                    Parse amp Persist ReprojectionTaskStatus

                                                                                    GenericWorker

                                                                                    (Worker Role)

                                                                                    hellip

                                                                                    Job Queue

                                                                                    hellip

                                                                                    Dispatch

                                                                                    Task Queue

                                                                                    Points to

                                                                                    hellip

                                                                                    ScanTimeList

                                                                                    SwathGranuleMeta

                                                                                    Reprojection Data

                                                                                    Storage

                                                                                    Each entity specifies a

                                                                                    single reprojection job

                                                                                    request

                                                                                    Each entity specifies a

                                                                                    single reprojection task (ie

                                                                                    a single tile)

                                                                                    Query this table to get

                                                                                    geo-metadata (eg

                                                                                    boundaries) for each swath

                                                                                    tile

                                                                                    Query this table to get the

                                                                                    list of satellite scan times

                                                                                    that cover a target tile

                                                                                    Swath Source

                                                                                    Data Storage

                                                                                    bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                    bull Storage costs driven by data scale and 6 month project duration

                                                                                    bull Small with respect to the people costs even at graduate student rates

                                                                                    Reduction 1 Queue

                                                                                    Source Metadata

                                                                                    Request Queue

                                                                                    Scientific Results Download

                                                                                    Data Collection Stage

                                                                                    Source Imagery Download Sites

                                                                                    Reprojection Queue

                                                                                    Reduction 2 Queue

                                                                                    Download

                                                                                    Queue

                                                                                    Scientists

                                                                                    Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                    400-500 GB

                                                                                    60K files

                                                                                    10 MBsec

                                                                                    11 hours

                                                                                    lt10 workers

                                                                                    $50 upload

                                                                                    $450 storage

                                                                                    400 GB

                                                                                    45K files

                                                                                    3500 hours

                                                                                    20-100

                                                                                    workers

                                                                                    5-7 GB

                                                                                    55K files

                                                                                    1800 hours

                                                                                    20-100

                                                                                    workers

                                                                                    lt10 GB

                                                                                    ~1K files

                                                                                    1800 hours

                                                                                    20-100

                                                                                    workers

                                                                                    $420 cpu

                                                                                    $60 download

                                                                                    $216 cpu

                                                                                    $1 download

                                                                                    $6 storage

                                                                                    $216 cpu

                                                                                    $2 download

                                                                                    $9 storage

                                                                                    AzureMODIS

                                                                                    Service Web Role Portal

                                                                                    Total $1420

                                                                                    bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                    potential to be important to both large and small scale science problems

                                                                                    bull Equally import they can increase participation in research providing needed

                                                                                    resources to userscommunities without ready access

                                                                                    bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                    support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                    latency applications do not perform optimally on clouds today

                                                                                    bull Provide valuable fault tolerance and scalability abstractions

                                                                                    bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                    bull Clouds services to support research provide considerable leverage for both

                                                                                    individual researchers and entire communities of researchers

                                                                                    • Day 2 - Azure as a PaaS
                                                                                    • Day 2 - Applications

                                                                                      Truncated Exponential Back Off Polling

                                                                                      Consider a backoff polling approach Each empty poll

                                                                                      increases interval by 2x

                                                                                      A successful sets the interval back to 1

                                                                                      44

                                                                                      2 1

                                                                                      1 1

                                                                                      C1

                                                                                      C2

                                                                                      Removing Poison Messages

                                                                                      1 1

                                                                                      2 1

                                                                                      3 4 0

                                                                                      Producers Consumers

                                                                                      P2

                                                                                      P1

                                                                                      3 0

                                                                                      2 GetMessage(Q 30 s) msg 2

                                                                                      1 GetMessage(Q 30 s) msg 1

                                                                                      1 1

                                                                                      2 1

                                                                                      1 0

                                                                                      2 0

                                                                                      45

                                                                                      C1

                                                                                      C2

                                                                                      Removing Poison Messages

                                                                                      3 4 0

                                                                                      Producers Consumers

                                                                                      P2

                                                                                      P1

                                                                                      1 1

                                                                                      2 1

                                                                                      2 GetMessage(Q 30 s) msg 2 3 C2 consumed msg 2 4 DeleteMessage(Q msg 2) 7 GetMessage(Q 30 s) msg 1

                                                                                      1 GetMessage(Q 30 s) msg 1 5 C1 crashed

                                                                                      1 1

                                                                                      2 1

                                                                                      6 msg1 visible 30 s after Dequeue 3 0

                                                                                      1 2

                                                                                      1 1

                                                                                      1 2

                                                                                      46

                                                                                      C1

                                                                                      C2

                                                                                      Removing Poison Messages

                                                                                      3 4 0

                                                                                      Producers Consumers

                                                                                      P2

                                                                                      P1

                                                                                      1 2

                                                                                      2 Dequeue(Q 30 sec) msg 2 3 C2 consumed msg 2 4 Delete(Q msg 2) 7 Dequeue(Q 30 sec) msg 1 8 C2 crashed

                                                                                      1 Dequeue(Q 30 sec) msg 1 5 C1 crashed 10 C1 restarted 11 Dequeue(Q 30 sec) msg 1 12 DequeueCount gt 2 13 Delete (Q msg1) 1

                                                                                      2

                                                                                      6 msg1 visible 30s after Dequeue 9 msg1 visible 30s after Dequeue

                                                                                      3 0

                                                                                      1 3

                                                                                      1 2

                                                                                      1 3

                                                                                      Queues Recap

                                                                                      bullNo need to deal with failures Make message

                                                                                      processing idempotent

                                                                                      bullInvisible messages result in out of order Do not rely on order

                                                                                      bullEnforce threshold on messagersquos dequeue count Use Dequeue count to remove

                                                                                      poison messages

                                                                                      bullMessages gt 8KB

                                                                                      bullBatch messages

                                                                                      bullGarbage collect orphaned blobs

                                                                                      bullDynamically increasereduce workers

                                                                                      Use blob to store message data with

                                                                                      reference in message

                                                                                      Use message count to scale

                                                                                      bullNo need to deal with failures

                                                                                      bullInvisible messages result in out of order

                                                                                      bullEnforce threshold on messagersquos dequeue count

                                                                                      bullDynamically increasereduce workers

                                                                                      Windows Azure Storage Takeaways

                                                                                      Blobs

                                                                                      Drives

                                                                                      Tables

                                                                                      Queues

                                                                                      httpblogsmsdncomwindowsazurestorage

                                                                                      httpazurescopecloudappnet

                                                                                      49

                                                                                      A Quick Exercise

                                                                                      hellipThen letrsquos look at some code and some tools

                                                                                      50

                                                                                      Code ndash AccountInformationcs public class AccountInformation private static string storageKey = ldquotHiSiSnOtMyKeY private static string accountName = jjstore private static StorageCredentialsAccountAndKey credentials internal static StorageCredentialsAccountAndKey Credentials get if (credentials == null) credentials = new StorageCredentialsAccountAndKey(accountName storageKey) return credentials

                                                                                      51

                                                                                      Code ndash BlobHelpercs public class BlobHelper private static string defaultContainerName = school private CloudBlobClient client = null private CloudBlobContainer container = null private void InitContainer() if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudBlobClient() container = clientGetContainerReference(defaultContainerName) containerCreateIfNotExist() BlobContainerPermissions permissions = containerGetPermissions() permissionsPublicAccess = BlobContainerPublicAccessTypeContainer containerSetPermissions(permissions)

                                                                                      52

                                                                                      Code ndash BlobHelpercs

                                                                                      public void WriteFileToBlob(string filePath) if (client == null || container == null) InitContainer() FileInfo file = new FileInfo(filePath) CloudBlob blob = containerGetBlobReference(fileName) blobPropertiesContentType = GetContentType(fileExtension) blobUploadFile(fileFullName) Or if you want to write a string replace the last line with blobUploadText(someString) And make sure you set the content type to the appropriate MIME type (eg ldquotextplainrdquo)

                                                                                      53

                                                                                      Code ndash BlobHelpercs

                                                                                      public string GetBlobText(string blobName) if (client == null || container == null) InitContainer() CloudBlob blob = containerGetBlobReference(blobName) try return blobDownloadText() catch (Exception) The blob probably does not exist or there is no connection available return null

                                                                                      54

                                                                                      Application Code - Blobs private void SaveToCloudButton_Click(object sender RoutedEventArgs e) StringBuilder buff = new StringBuilder() buffAppendLine(LastNameFirstNameEmailBirthdayNativeLanguageFavoriteIceCreamYearsInPhDGraduated) foreach (AttendeeEntity attendee in attendees) buffAppendLine(attendeeToCsvString()) blobHelperWriteStringToBlob(SummerSchoolAttendeestxt buffToString())

                                                                                      The blob is now available at httpltAccountNamegtblobcorewindowsnetltContainerNamegtltBlobNamegt Or in this case httpjjstoreblobcorewindowsnetschoolSummerSchoolAttendeestxt

                                                                                      55

                                                                                      Code - TableEntities using MicrosoftWindowsAzureStorageClient public class AttendeeEntity TableServiceEntity public string FirstName get set public string LastName get set public string Email get set public DateTime Birthday get set public string FavoriteIceCream get set public int YearsInPhD get set public bool Graduated get set hellip

                                                                                      56

                                                                                      Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

                                                                                      57

                                                                                      Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

                                                                                      58

                                                                                      Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

                                                                                      59

                                                                                      Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

                                                                                      60

                                                                                      Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

                                                                                      Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

                                                                                      61

                                                                                      Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

                                                                                      62

                                                                                      Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

                                                                                      Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

                                                                                      63

                                                                                      Tools ndash Fiddler2

                                                                                      Best Practices

                                                                                      Picking the Right VM Size

                                                                                      bull Having the correct VM size can make a big difference in costs

                                                                                      bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                                                                                      bull If you scale better than linear across cores larger VMs could save you money

                                                                                      bull Pretty rare to see linear scaling across 8 cores

                                                                                      bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                                                                                      bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                                                                                      Using Your VM to the Maximum

                                                                                      Remember bull 1 role instance == 1 VM running Windows

                                                                                      bull 1 role instance = one specific task for your code

                                                                                      bull Yoursquore paying for the entire VM so why not use it

                                                                                      bull Common mistake ndash split up code into multiple roles each not using up CPU

                                                                                      bull Balance between using up CPU vs having free capacity in times of need

                                                                                      bull Multiple ways to use your CPU to the fullest

                                                                                      Exploiting Concurrency

                                                                                      bull Spin up additional processes each with a specific task or as a unit of concurrency

                                                                                      bull May not be ideal if number of active processes exceeds number of cores

                                                                                      bull Use multithreading aggressively

                                                                                      bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                                                                                      bull In NET 4 use the Task Parallel Library

                                                                                      bull Data parallelism

                                                                                      bull Task parallelism

                                                                                      Finding Good Code Neighbors

                                                                                      bull Typically code falls into one or more of these categories

                                                                                      bull Find code that is intensive with different resources to live together

                                                                                      bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                                                                                      Memory Intensive

                                                                                      CPU Intensive

                                                                                      Network IO Intensive

                                                                                      Storage IO Intensive

                                                                                      Scaling Appropriately

                                                                                      bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                                                                                      bull Spinning VMs up and down automatically is good at large scale

                                                                                      bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                                                                                      bull Being too aggressive in spinning down VMs can result in poor user experience

                                                                                      bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                                                                                      Performance Cost

                                                                                      Storage Costs

                                                                                      bull Understand an applicationrsquos storage profile and how storage billing works

                                                                                      bull Make service choices based on your app profile

                                                                                      bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                                                                                      bull Service choice can make a big cost difference based on your app profile

                                                                                      bull Caching and compressing They help a lot with storage costs

                                                                                      Saving Bandwidth Costs

                                                                                      Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                                                                      Sending fewer things over the wire often means getting fewer things from storage

                                                                                      Saving bandwidth costs often lead to savings in other places

                                                                                      Sending fewer things means your VM has time to do other tasks

                                                                                      All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                                                                      Compressing Content

                                                                                      1 Gzip all output content

                                                                                      bull All modern browsers can decompress on the fly

                                                                                      bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                                                                      2Tradeoff compute costs for storage size

                                                                                      3Minimize image sizes

                                                                                      bull Use Portable Network Graphics (PNGs)

                                                                                      bull Crush your PNGs

                                                                                      bull Strip needless metadata

                                                                                      bull Make all PNGs palette PNGs

                                                                                      Uncompressed

                                                                                      Content

                                                                                      Compressed

                                                                                      Content

                                                                                      Gzip

                                                                                      Minify JavaScript

                                                                                      Minify CCS

                                                                                      Minify Images

                                                                                      Best Practices Summary

                                                                                      Doing lsquolessrsquo is the key to saving costs

                                                                                      Measure everything

                                                                                      Know your application profile in and out

                                                                                      Research Examples in the Cloud

                                                                                      hellipon another set of slides

                                                                                      Map Reduce on Azure

                                                                                      bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                                                                      bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                                                                      76

                                                                                      Project Daytona - Map Reduce on Azure

                                                                                      httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                                                                      77

                                                                                      Questions and Discussionhellip

                                                                                      Thank you for hosting me at the Summer School

                                                                                      BLAST (Basic Local Alignment Search Tool)

                                                                                      bull The most important software in bioinformatics

                                                                                      bull Identify similarity between bio-sequences

                                                                                      Computationally intensive

                                                                                      bull Large number of pairwise alignment operations

                                                                                      bull A BLAST running can take 700 ~ 1000 CPU hours

                                                                                      bull Sequence databases growing exponentially

                                                                                      bull GenBank doubled in size in about 15 months

                                                                                      It is easy to parallelize BLAST

                                                                                      bull Segment the input

                                                                                      bull Segment processing (querying) is pleasingly parallel

                                                                                      bull Segment the database (eg mpiBLAST)

                                                                                      bull Needs special result reduction processing

                                                                                      Large volume data

                                                                                      bull A normal Blast database can be as large as 10GB

                                                                                      bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                                                      bull The output of BLAST is usually 10-100x larger than the input

                                                                                      bull Parallel BLAST engine on Azure

                                                                                      bull Query-segmentation data-parallel pattern bull split the input sequences

                                                                                      bull query partitions in parallel

                                                                                      bull merge results together when done

                                                                                      bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                                                      bull With three special considerations bull Batch job management

                                                                                      bull Task parallelism on an elastic Cloud

                                                                                      Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                                                      on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                                                      A simple SplitJoin pattern

                                                                                      Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                                                      bull 1248 for small middle large and extra large instance size

                                                                                      Task granularity bull Large partition load imbalance

                                                                                      bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                                                      bull Data transferring overhead

                                                                                      Best Practice test runs to profiling and set size to mitigate the overhead

                                                                                      Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                                                      bull too small repeated computation

                                                                                      bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                                                      bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                                                      bull Watch out for the 2-hour maximum limitation

                                                                                      BLAST task

                                                                                      Splitting task

                                                                                      BLAST task

                                                                                      BLAST task

                                                                                      BLAST task

                                                                                      hellip

                                                                                      Merging Task

                                                                                      Task size vs Performance

                                                                                      bull Benefit of the warm cache effect

                                                                                      bull 100 sequences per partition is the best choice

                                                                                      Instance size vs Performance

                                                                                      bull Super-linear speedup with larger size worker instances

                                                                                      bull Primarily due to the memory capability

                                                                                      Task SizeInstance Size vs Cost

                                                                                      bull Extra-large instance generated the best and the most economical throughput

                                                                                      bull Fully utilize the resource

                                                                                      Web

                                                                                      Portal

                                                                                      Web

                                                                                      Service

                                                                                      Job registration

                                                                                      Job Scheduler

                                                                                      Worker

                                                                                      Worker

                                                                                      Worker

                                                                                      Global

                                                                                      dispatch

                                                                                      queue

                                                                                      Web Role

                                                                                      Azure Table

                                                                                      Job Management Role

                                                                                      Azure Blob

                                                                                      Database

                                                                                      updating Role

                                                                                      hellip

                                                                                      Scaling Engine

                                                                                      Blast databases

                                                                                      temporary data etc)

                                                                                      Job Registry NCBI databases

                                                                                      BLAST task

                                                                                      Splitting task

                                                                                      BLAST task

                                                                                      BLAST task

                                                                                      BLAST task

                                                                                      hellip

                                                                                      Merging Task

                                                                                      ASPNET program hosted by a web role instance bull Submit jobs

                                                                                      bull Track jobrsquos status and logs

                                                                                      AuthenticationAuthorization based on Live ID

                                                                                      The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                                                      Web Portal

                                                                                      Web Service

                                                                                      Job registration

                                                                                      Job Scheduler

                                                                                      Job Portal

                                                                                      Scaling Engine

                                                                                      Job Registry

                                                                                      R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                                                      Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                                                      bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                                                      AzureBLAST significantly saved computing timehellip

                                                                                      Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                                                      ldquoAll against Allrdquo query bull The database is also the input query

                                                                                      bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                                                      bull Theoretically 100 billion sequence comparisons

                                                                                      Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                                                      bull Would require 3216731 minutes (61 years) on one desktop

                                                                                      One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                                                      bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                                      bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                                      bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                                      bull Each segment consists of smaller partitions

                                                                                      bull When load imbalances redistribute the load manually

                                                                                      5

                                                                                      0

                                                                                      62 6

                                                                                      2 6

                                                                                      2 6

                                                                                      2 6

                                                                                      2 5

                                                                                      0 62

                                                                                      bull Total size of the output result is ~230GB

                                                                                      bull The number of total hits is 1764579487

                                                                                      bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                                      bull But based our estimates real working instance time should be 6~8 day

                                                                                      bull Look into log data to analyze what took placehellip

                                                                                      5

                                                                                      0

                                                                                      62 6

                                                                                      2 6

                                                                                      2 6

                                                                                      2 6

                                                                                      2 5

                                                                                      0 62

                                                                                      A normal log record should be

                                                                                      Otherwise something is wrong (eg task failed to complete)

                                                                                      3312010 614 RD00155D3611B0 Executing the task 251523

                                                                                      3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                                      3312010 625 RD00155D3611B0 Executing the task 251553

                                                                                      3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                                      3312010 644 RD00155D3611B0 Executing the task 251600

                                                                                      3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                                      3312010 822 RD00155D3611B0 Executing the task 251774

                                                                                      3312010 950 RD00155D3611B0 Executing the task 251895

                                                                                      3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                                      North Europe Data Center totally 34256 tasks processed

                                                                                      All 62 compute nodes lost tasks

                                                                                      and then came back in a group

                                                                                      This is an Update domain

                                                                                      ~30 mins

                                                                                      ~ 6 nodes in one group

                                                                                      35 Nodes experience blob

                                                                                      writing failure at same

                                                                                      time

                                                                                      West Europe Datacenter 30976 tasks are completed and job was killed

                                                                                      A reasonable guess the

                                                                                      Fault Domain is working

                                                                                      MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                                      You never miss the water till the well has run dry Irish Proverb

                                                                                      ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                                      Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                                      λv = Latent heat of vaporization (Jg)

                                                                                      Rn = Net radiation (W m-2)

                                                                                      cp = Specific heat capacity of air (J kg-1 K-1)

                                                                                      ρa = dry air density (kg m-3)

                                                                                      δq = vapor pressure deficit (Pa)

                                                                                      ga = Conductivity of air (inverse of ra) (m s-1)

                                                                                      gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                                      γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                                      Estimating resistanceconductivity across a

                                                                                      catchment can be tricky

                                                                                      bull Lots of inputs big data reduction

                                                                                      bull Some of the inputs are not so simple

                                                                                      119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                                      (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                                      Penman-Monteith (1964)

                                                                                      Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                                      NASA MODIS

                                                                                      imagery source

                                                                                      archives

                                                                                      5 TB (600K files)

                                                                                      FLUXNET curated

                                                                                      sensor dataset

                                                                                      (30GB 960 files)

                                                                                      FLUXNET curated

                                                                                      field dataset

                                                                                      2 KB (1 file)

                                                                                      NCEPNCAR

                                                                                      ~100MB

                                                                                      (4K files)

                                                                                      Vegetative

                                                                                      clumping

                                                                                      ~5MB (1file)

                                                                                      Climate

                                                                                      classification

                                                                                      ~1MB (1file)

                                                                                      20 US year = 1 global year

                                                                                      Data collection (map) stage

                                                                                      bull Downloads requested input tiles

                                                                                      from NASA ftp sites

                                                                                      bull Includes geospatial lookup for

                                                                                      non-sinusoidal tiles that will

                                                                                      contribute to a reprojected

                                                                                      sinusoidal tile

                                                                                      Reprojection (map) stage

                                                                                      bull Converts source tile(s) to

                                                                                      intermediate result sinusoidal tiles

                                                                                      bull Simple nearest neighbor or spline

                                                                                      algorithms

                                                                                      Derivation reduction stage

                                                                                      bull First stage visible to scientist

                                                                                      bull Computes ET in our initial use

                                                                                      Analysis reduction stage

                                                                                      bull Optional second stage visible to

                                                                                      scientist

                                                                                      bull Enables production of science

                                                                                      analysis artifacts such as maps

                                                                                      tables virtual sensors

                                                                                      Reduction 1

                                                                                      Queue

                                                                                      Source

                                                                                      Metadata

                                                                                      AzureMODIS

                                                                                      Service Web Role Portal

                                                                                      Request

                                                                                      Queue

                                                                                      Scientific

                                                                                      Results

                                                                                      Download

                                                                                      Data Collection Stage

                                                                                      Source Imagery Download Sites

                                                                                      Reprojection

                                                                                      Queue

                                                                                      Reduction 2

                                                                                      Queue

                                                                                      Download

                                                                                      Queue

                                                                                      Scientists

                                                                                      Science results

                                                                                      Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                      httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                      bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                      bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                      bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                      recoverable units of work

                                                                                      bull Execution status of all jobs and tasks persisted in Tables

                                                                                      ltPipelineStagegt

                                                                                      Request

                                                                                      hellip ltPipelineStagegtJobStatus

                                                                                      Persist ltPipelineStagegtJob Queue

                                                                                      MODISAzure Service

                                                                                      (Web Role)

                                                                                      Service Monitor

                                                                                      (Worker Role)

                                                                                      Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                      hellip

                                                                                      Dispatch

                                                                                      ltPipelineStagegtTask Queue

                                                                                      All work actually done by a Worker Role

                                                                                      bull Sandboxes science or other executable

                                                                                      bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                      Service Monitor

                                                                                      (Worker Role)

                                                                                      Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                      GenericWorker

                                                                                      (Worker Role)

                                                                                      hellip

                                                                                      hellip

                                                                                      Dispatch

                                                                                      ltPipelineStagegtTask Queue

                                                                                      hellip

                                                                                      ltInputgtData Storage

                                                                                      bull Dequeues tasks created by the Service Monitor

                                                                                      bull Retries failed tasks 3 times

                                                                                      bull Maintains all task status

                                                                                      Reprojection Request

                                                                                      hellip

                                                                                      Service Monitor

                                                                                      (Worker Role)

                                                                                      ReprojectionJobStatus Persist

                                                                                      Parse amp Persist ReprojectionTaskStatus

                                                                                      GenericWorker

                                                                                      (Worker Role)

                                                                                      hellip

                                                                                      Job Queue

                                                                                      hellip

                                                                                      Dispatch

                                                                                      Task Queue

                                                                                      Points to

                                                                                      hellip

                                                                                      ScanTimeList

                                                                                      SwathGranuleMeta

                                                                                      Reprojection Data

                                                                                      Storage

                                                                                      Each entity specifies a

                                                                                      single reprojection job

                                                                                      request

                                                                                      Each entity specifies a

                                                                                      single reprojection task (ie

                                                                                      a single tile)

                                                                                      Query this table to get

                                                                                      geo-metadata (eg

                                                                                      boundaries) for each swath

                                                                                      tile

                                                                                      Query this table to get the

                                                                                      list of satellite scan times

                                                                                      that cover a target tile

                                                                                      Swath Source

                                                                                      Data Storage

                                                                                      bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                      bull Storage costs driven by data scale and 6 month project duration

                                                                                      bull Small with respect to the people costs even at graduate student rates

                                                                                      Reduction 1 Queue

                                                                                      Source Metadata

                                                                                      Request Queue

                                                                                      Scientific Results Download

                                                                                      Data Collection Stage

                                                                                      Source Imagery Download Sites

                                                                                      Reprojection Queue

                                                                                      Reduction 2 Queue

                                                                                      Download

                                                                                      Queue

                                                                                      Scientists

                                                                                      Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                      400-500 GB

                                                                                      60K files

                                                                                      10 MBsec

                                                                                      11 hours

                                                                                      lt10 workers

                                                                                      $50 upload

                                                                                      $450 storage

                                                                                      400 GB

                                                                                      45K files

                                                                                      3500 hours

                                                                                      20-100

                                                                                      workers

                                                                                      5-7 GB

                                                                                      55K files

                                                                                      1800 hours

                                                                                      20-100

                                                                                      workers

                                                                                      lt10 GB

                                                                                      ~1K files

                                                                                      1800 hours

                                                                                      20-100

                                                                                      workers

                                                                                      $420 cpu

                                                                                      $60 download

                                                                                      $216 cpu

                                                                                      $1 download

                                                                                      $6 storage

                                                                                      $216 cpu

                                                                                      $2 download

                                                                                      $9 storage

                                                                                      AzureMODIS

                                                                                      Service Web Role Portal

                                                                                      Total $1420

                                                                                      bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                      potential to be important to both large and small scale science problems

                                                                                      bull Equally import they can increase participation in research providing needed

                                                                                      resources to userscommunities without ready access

                                                                                      bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                      support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                      latency applications do not perform optimally on clouds today

                                                                                      bull Provide valuable fault tolerance and scalability abstractions

                                                                                      bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                      bull Clouds services to support research provide considerable leverage for both

                                                                                      individual researchers and entire communities of researchers

                                                                                      • Day 2 - Azure as a PaaS
                                                                                      • Day 2 - Applications

                                                                                        44

                                                                                        2 1

                                                                                        1 1

                                                                                        C1

                                                                                        C2

                                                                                        Removing Poison Messages

                                                                                        1 1

                                                                                        2 1

                                                                                        3 4 0

                                                                                        Producers Consumers

                                                                                        P2

                                                                                        P1

                                                                                        3 0

                                                                                        2 GetMessage(Q 30 s) msg 2

                                                                                        1 GetMessage(Q 30 s) msg 1

                                                                                        1 1

                                                                                        2 1

                                                                                        1 0

                                                                                        2 0

                                                                                        45

                                                                                        C1

                                                                                        C2

                                                                                        Removing Poison Messages

                                                                                        3 4 0

                                                                                        Producers Consumers

                                                                                        P2

                                                                                        P1

                                                                                        1 1

                                                                                        2 1

                                                                                        2 GetMessage(Q 30 s) msg 2 3 C2 consumed msg 2 4 DeleteMessage(Q msg 2) 7 GetMessage(Q 30 s) msg 1

                                                                                        1 GetMessage(Q 30 s) msg 1 5 C1 crashed

                                                                                        1 1

                                                                                        2 1

                                                                                        6 msg1 visible 30 s after Dequeue 3 0

                                                                                        1 2

                                                                                        1 1

                                                                                        1 2

                                                                                        46

                                                                                        C1

                                                                                        C2

                                                                                        Removing Poison Messages

                                                                                        3 4 0

                                                                                        Producers Consumers

                                                                                        P2

                                                                                        P1

                                                                                        1 2

                                                                                        2 Dequeue(Q 30 sec) msg 2 3 C2 consumed msg 2 4 Delete(Q msg 2) 7 Dequeue(Q 30 sec) msg 1 8 C2 crashed

                                                                                        1 Dequeue(Q 30 sec) msg 1 5 C1 crashed 10 C1 restarted 11 Dequeue(Q 30 sec) msg 1 12 DequeueCount gt 2 13 Delete (Q msg1) 1

                                                                                        2

                                                                                        6 msg1 visible 30s after Dequeue 9 msg1 visible 30s after Dequeue

                                                                                        3 0

                                                                                        1 3

                                                                                        1 2

                                                                                        1 3

                                                                                        Queues Recap

                                                                                        bullNo need to deal with failures Make message

                                                                                        processing idempotent

                                                                                        bullInvisible messages result in out of order Do not rely on order

                                                                                        bullEnforce threshold on messagersquos dequeue count Use Dequeue count to remove

                                                                                        poison messages

                                                                                        bullMessages gt 8KB

                                                                                        bullBatch messages

                                                                                        bullGarbage collect orphaned blobs

                                                                                        bullDynamically increasereduce workers

                                                                                        Use blob to store message data with

                                                                                        reference in message

                                                                                        Use message count to scale

                                                                                        bullNo need to deal with failures

                                                                                        bullInvisible messages result in out of order

                                                                                        bullEnforce threshold on messagersquos dequeue count

                                                                                        bullDynamically increasereduce workers

                                                                                        Windows Azure Storage Takeaways

                                                                                        Blobs

                                                                                        Drives

                                                                                        Tables

                                                                                        Queues

                                                                                        httpblogsmsdncomwindowsazurestorage

                                                                                        httpazurescopecloudappnet

                                                                                        49

                                                                                        A Quick Exercise

                                                                                        hellipThen letrsquos look at some code and some tools

                                                                                        50

                                                                                        Code ndash AccountInformationcs public class AccountInformation private static string storageKey = ldquotHiSiSnOtMyKeY private static string accountName = jjstore private static StorageCredentialsAccountAndKey credentials internal static StorageCredentialsAccountAndKey Credentials get if (credentials == null) credentials = new StorageCredentialsAccountAndKey(accountName storageKey) return credentials

                                                                                        51

                                                                                        Code ndash BlobHelpercs public class BlobHelper private static string defaultContainerName = school private CloudBlobClient client = null private CloudBlobContainer container = null private void InitContainer() if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudBlobClient() container = clientGetContainerReference(defaultContainerName) containerCreateIfNotExist() BlobContainerPermissions permissions = containerGetPermissions() permissionsPublicAccess = BlobContainerPublicAccessTypeContainer containerSetPermissions(permissions)

                                                                                        52

                                                                                        Code ndash BlobHelpercs

                                                                                        public void WriteFileToBlob(string filePath) if (client == null || container == null) InitContainer() FileInfo file = new FileInfo(filePath) CloudBlob blob = containerGetBlobReference(fileName) blobPropertiesContentType = GetContentType(fileExtension) blobUploadFile(fileFullName) Or if you want to write a string replace the last line with blobUploadText(someString) And make sure you set the content type to the appropriate MIME type (eg ldquotextplainrdquo)

                                                                                        53

                                                                                        Code ndash BlobHelpercs

                                                                                        public string GetBlobText(string blobName) if (client == null || container == null) InitContainer() CloudBlob blob = containerGetBlobReference(blobName) try return blobDownloadText() catch (Exception) The blob probably does not exist or there is no connection available return null

                                                                                        54

                                                                                        Application Code - Blobs private void SaveToCloudButton_Click(object sender RoutedEventArgs e) StringBuilder buff = new StringBuilder() buffAppendLine(LastNameFirstNameEmailBirthdayNativeLanguageFavoriteIceCreamYearsInPhDGraduated) foreach (AttendeeEntity attendee in attendees) buffAppendLine(attendeeToCsvString()) blobHelperWriteStringToBlob(SummerSchoolAttendeestxt buffToString())

                                                                                        The blob is now available at httpltAccountNamegtblobcorewindowsnetltContainerNamegtltBlobNamegt Or in this case httpjjstoreblobcorewindowsnetschoolSummerSchoolAttendeestxt

                                                                                        55

                                                                                        Code - TableEntities using MicrosoftWindowsAzureStorageClient public class AttendeeEntity TableServiceEntity public string FirstName get set public string LastName get set public string Email get set public DateTime Birthday get set public string FavoriteIceCream get set public int YearsInPhD get set public bool Graduated get set hellip

                                                                                        56

                                                                                        Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

                                                                                        57

                                                                                        Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

                                                                                        58

                                                                                        Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

                                                                                        59

                                                                                        Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

                                                                                        60

                                                                                        Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

                                                                                        Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

                                                                                        61

                                                                                        Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

                                                                                        62

                                                                                        Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

                                                                                        Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

                                                                                        63

                                                                                        Tools ndash Fiddler2

                                                                                        Best Practices

                                                                                        Picking the Right VM Size

                                                                                        bull Having the correct VM size can make a big difference in costs

                                                                                        bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                                                                                        bull If you scale better than linear across cores larger VMs could save you money

                                                                                        bull Pretty rare to see linear scaling across 8 cores

                                                                                        bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                                                                                        bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                                                                                        Using Your VM to the Maximum

                                                                                        Remember bull 1 role instance == 1 VM running Windows

                                                                                        bull 1 role instance = one specific task for your code

                                                                                        bull Yoursquore paying for the entire VM so why not use it

                                                                                        bull Common mistake ndash split up code into multiple roles each not using up CPU

                                                                                        bull Balance between using up CPU vs having free capacity in times of need

                                                                                        bull Multiple ways to use your CPU to the fullest

                                                                                        Exploiting Concurrency

                                                                                        bull Spin up additional processes each with a specific task or as a unit of concurrency

                                                                                        bull May not be ideal if number of active processes exceeds number of cores

                                                                                        bull Use multithreading aggressively

                                                                                        bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                                                                                        bull In NET 4 use the Task Parallel Library

                                                                                        bull Data parallelism

                                                                                        bull Task parallelism

                                                                                        Finding Good Code Neighbors

                                                                                        bull Typically code falls into one or more of these categories

                                                                                        bull Find code that is intensive with different resources to live together

                                                                                        bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                                                                                        Memory Intensive

                                                                                        CPU Intensive

                                                                                        Network IO Intensive

                                                                                        Storage IO Intensive

                                                                                        Scaling Appropriately

                                                                                        bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                                                                                        bull Spinning VMs up and down automatically is good at large scale

                                                                                        bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                                                                                        bull Being too aggressive in spinning down VMs can result in poor user experience

                                                                                        bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                                                                                        Performance Cost

                                                                                        Storage Costs

                                                                                        bull Understand an applicationrsquos storage profile and how storage billing works

                                                                                        bull Make service choices based on your app profile

                                                                                        bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                                                                                        bull Service choice can make a big cost difference based on your app profile

                                                                                        bull Caching and compressing They help a lot with storage costs

                                                                                        Saving Bandwidth Costs

                                                                                        Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                                                                        Sending fewer things over the wire often means getting fewer things from storage

                                                                                        Saving bandwidth costs often lead to savings in other places

                                                                                        Sending fewer things means your VM has time to do other tasks

                                                                                        All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                                                                        Compressing Content

                                                                                        1 Gzip all output content

                                                                                        bull All modern browsers can decompress on the fly

                                                                                        bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                                                                        2Tradeoff compute costs for storage size

                                                                                        3Minimize image sizes

                                                                                        bull Use Portable Network Graphics (PNGs)

                                                                                        bull Crush your PNGs

                                                                                        bull Strip needless metadata

                                                                                        bull Make all PNGs palette PNGs

                                                                                        Uncompressed

                                                                                        Content

                                                                                        Compressed

                                                                                        Content

                                                                                        Gzip

                                                                                        Minify JavaScript

                                                                                        Minify CCS

                                                                                        Minify Images

                                                                                        Best Practices Summary

                                                                                        Doing lsquolessrsquo is the key to saving costs

                                                                                        Measure everything

                                                                                        Know your application profile in and out

                                                                                        Research Examples in the Cloud

                                                                                        hellipon another set of slides

                                                                                        Map Reduce on Azure

                                                                                        bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                                                                        bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                                                                        76

                                                                                        Project Daytona - Map Reduce on Azure

                                                                                        httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                                                                        77

                                                                                        Questions and Discussionhellip

                                                                                        Thank you for hosting me at the Summer School

                                                                                        BLAST (Basic Local Alignment Search Tool)

                                                                                        bull The most important software in bioinformatics

                                                                                        bull Identify similarity between bio-sequences

                                                                                        Computationally intensive

                                                                                        bull Large number of pairwise alignment operations

                                                                                        bull A BLAST running can take 700 ~ 1000 CPU hours

                                                                                        bull Sequence databases growing exponentially

                                                                                        bull GenBank doubled in size in about 15 months

                                                                                        It is easy to parallelize BLAST

                                                                                        bull Segment the input

                                                                                        bull Segment processing (querying) is pleasingly parallel

                                                                                        bull Segment the database (eg mpiBLAST)

                                                                                        bull Needs special result reduction processing

                                                                                        Large volume data

                                                                                        bull A normal Blast database can be as large as 10GB

                                                                                        bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                                                        bull The output of BLAST is usually 10-100x larger than the input

                                                                                        bull Parallel BLAST engine on Azure

                                                                                        bull Query-segmentation data-parallel pattern bull split the input sequences

                                                                                        bull query partitions in parallel

                                                                                        bull merge results together when done

                                                                                        bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                                                        bull With three special considerations bull Batch job management

                                                                                        bull Task parallelism on an elastic Cloud

                                                                                        Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                                                        on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                                                        A simple SplitJoin pattern

                                                                                        Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                                                        bull 1248 for small middle large and extra large instance size

                                                                                        Task granularity bull Large partition load imbalance

                                                                                        bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                                                        bull Data transferring overhead

                                                                                        Best Practice test runs to profiling and set size to mitigate the overhead

                                                                                        Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                                                        bull too small repeated computation

                                                                                        bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                                                        bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                                                        bull Watch out for the 2-hour maximum limitation

                                                                                        BLAST task

                                                                                        Splitting task

                                                                                        BLAST task

                                                                                        BLAST task

                                                                                        BLAST task

                                                                                        hellip

                                                                                        Merging Task

                                                                                        Task size vs Performance

                                                                                        bull Benefit of the warm cache effect

                                                                                        bull 100 sequences per partition is the best choice

                                                                                        Instance size vs Performance

                                                                                        bull Super-linear speedup with larger size worker instances

                                                                                        bull Primarily due to the memory capability

                                                                                        Task SizeInstance Size vs Cost

                                                                                        bull Extra-large instance generated the best and the most economical throughput

                                                                                        bull Fully utilize the resource

                                                                                        Web

                                                                                        Portal

                                                                                        Web

                                                                                        Service

                                                                                        Job registration

                                                                                        Job Scheduler

                                                                                        Worker

                                                                                        Worker

                                                                                        Worker

                                                                                        Global

                                                                                        dispatch

                                                                                        queue

                                                                                        Web Role

                                                                                        Azure Table

                                                                                        Job Management Role

                                                                                        Azure Blob

                                                                                        Database

                                                                                        updating Role

                                                                                        hellip

                                                                                        Scaling Engine

                                                                                        Blast databases

                                                                                        temporary data etc)

                                                                                        Job Registry NCBI databases

                                                                                        BLAST task

                                                                                        Splitting task

                                                                                        BLAST task

                                                                                        BLAST task

                                                                                        BLAST task

                                                                                        hellip

                                                                                        Merging Task

                                                                                        ASPNET program hosted by a web role instance bull Submit jobs

                                                                                        bull Track jobrsquos status and logs

                                                                                        AuthenticationAuthorization based on Live ID

                                                                                        The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                                                        Web Portal

                                                                                        Web Service

                                                                                        Job registration

                                                                                        Job Scheduler

                                                                                        Job Portal

                                                                                        Scaling Engine

                                                                                        Job Registry

                                                                                        R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                                                        Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                                                        bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                                                        AzureBLAST significantly saved computing timehellip

                                                                                        Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                                                        ldquoAll against Allrdquo query bull The database is also the input query

                                                                                        bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                                                        bull Theoretically 100 billion sequence comparisons

                                                                                        Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                                                        bull Would require 3216731 minutes (61 years) on one desktop

                                                                                        One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                                                        bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                                        bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                                        bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                                        bull Each segment consists of smaller partitions

                                                                                        bull When load imbalances redistribute the load manually

                                                                                        5

                                                                                        0

                                                                                        62 6

                                                                                        2 6

                                                                                        2 6

                                                                                        2 6

                                                                                        2 5

                                                                                        0 62

                                                                                        bull Total size of the output result is ~230GB

                                                                                        bull The number of total hits is 1764579487

                                                                                        bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                                        bull But based our estimates real working instance time should be 6~8 day

                                                                                        bull Look into log data to analyze what took placehellip

                                                                                        5

                                                                                        0

                                                                                        62 6

                                                                                        2 6

                                                                                        2 6

                                                                                        2 6

                                                                                        2 5

                                                                                        0 62

                                                                                        A normal log record should be

                                                                                        Otherwise something is wrong (eg task failed to complete)

                                                                                        3312010 614 RD00155D3611B0 Executing the task 251523

                                                                                        3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                                        3312010 625 RD00155D3611B0 Executing the task 251553

                                                                                        3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                                        3312010 644 RD00155D3611B0 Executing the task 251600

                                                                                        3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                                        3312010 822 RD00155D3611B0 Executing the task 251774

                                                                                        3312010 950 RD00155D3611B0 Executing the task 251895

                                                                                        3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                                        North Europe Data Center totally 34256 tasks processed

                                                                                        All 62 compute nodes lost tasks

                                                                                        and then came back in a group

                                                                                        This is an Update domain

                                                                                        ~30 mins

                                                                                        ~ 6 nodes in one group

                                                                                        35 Nodes experience blob

                                                                                        writing failure at same

                                                                                        time

                                                                                        West Europe Datacenter 30976 tasks are completed and job was killed

                                                                                        A reasonable guess the

                                                                                        Fault Domain is working

                                                                                        MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                                        You never miss the water till the well has run dry Irish Proverb

                                                                                        ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                                        Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                                        λv = Latent heat of vaporization (Jg)

                                                                                        Rn = Net radiation (W m-2)

                                                                                        cp = Specific heat capacity of air (J kg-1 K-1)

                                                                                        ρa = dry air density (kg m-3)

                                                                                        δq = vapor pressure deficit (Pa)

                                                                                        ga = Conductivity of air (inverse of ra) (m s-1)

                                                                                        gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                                        γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                                        Estimating resistanceconductivity across a

                                                                                        catchment can be tricky

                                                                                        bull Lots of inputs big data reduction

                                                                                        bull Some of the inputs are not so simple

                                                                                        119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                                        (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                                        Penman-Monteith (1964)

                                                                                        Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                                        NASA MODIS

                                                                                        imagery source

                                                                                        archives

                                                                                        5 TB (600K files)

                                                                                        FLUXNET curated

                                                                                        sensor dataset

                                                                                        (30GB 960 files)

                                                                                        FLUXNET curated

                                                                                        field dataset

                                                                                        2 KB (1 file)

                                                                                        NCEPNCAR

                                                                                        ~100MB

                                                                                        (4K files)

                                                                                        Vegetative

                                                                                        clumping

                                                                                        ~5MB (1file)

                                                                                        Climate

                                                                                        classification

                                                                                        ~1MB (1file)

                                                                                        20 US year = 1 global year

                                                                                        Data collection (map) stage

                                                                                        bull Downloads requested input tiles

                                                                                        from NASA ftp sites

                                                                                        bull Includes geospatial lookup for

                                                                                        non-sinusoidal tiles that will

                                                                                        contribute to a reprojected

                                                                                        sinusoidal tile

                                                                                        Reprojection (map) stage

                                                                                        bull Converts source tile(s) to

                                                                                        intermediate result sinusoidal tiles

                                                                                        bull Simple nearest neighbor or spline

                                                                                        algorithms

                                                                                        Derivation reduction stage

                                                                                        bull First stage visible to scientist

                                                                                        bull Computes ET in our initial use

                                                                                        Analysis reduction stage

                                                                                        bull Optional second stage visible to

                                                                                        scientist

                                                                                        bull Enables production of science

                                                                                        analysis artifacts such as maps

                                                                                        tables virtual sensors

                                                                                        Reduction 1

                                                                                        Queue

                                                                                        Source

                                                                                        Metadata

                                                                                        AzureMODIS

                                                                                        Service Web Role Portal

                                                                                        Request

                                                                                        Queue

                                                                                        Scientific

                                                                                        Results

                                                                                        Download

                                                                                        Data Collection Stage

                                                                                        Source Imagery Download Sites

                                                                                        Reprojection

                                                                                        Queue

                                                                                        Reduction 2

                                                                                        Queue

                                                                                        Download

                                                                                        Queue

                                                                                        Scientists

                                                                                        Science results

                                                                                        Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                        httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                        bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                        bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                        bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                        recoverable units of work

                                                                                        bull Execution status of all jobs and tasks persisted in Tables

                                                                                        ltPipelineStagegt

                                                                                        Request

                                                                                        hellip ltPipelineStagegtJobStatus

                                                                                        Persist ltPipelineStagegtJob Queue

                                                                                        MODISAzure Service

                                                                                        (Web Role)

                                                                                        Service Monitor

                                                                                        (Worker Role)

                                                                                        Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                        hellip

                                                                                        Dispatch

                                                                                        ltPipelineStagegtTask Queue

                                                                                        All work actually done by a Worker Role

                                                                                        bull Sandboxes science or other executable

                                                                                        bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                        Service Monitor

                                                                                        (Worker Role)

                                                                                        Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                        GenericWorker

                                                                                        (Worker Role)

                                                                                        hellip

                                                                                        hellip

                                                                                        Dispatch

                                                                                        ltPipelineStagegtTask Queue

                                                                                        hellip

                                                                                        ltInputgtData Storage

                                                                                        bull Dequeues tasks created by the Service Monitor

                                                                                        bull Retries failed tasks 3 times

                                                                                        bull Maintains all task status

                                                                                        Reprojection Request

                                                                                        hellip

                                                                                        Service Monitor

                                                                                        (Worker Role)

                                                                                        ReprojectionJobStatus Persist

                                                                                        Parse amp Persist ReprojectionTaskStatus

                                                                                        GenericWorker

                                                                                        (Worker Role)

                                                                                        hellip

                                                                                        Job Queue

                                                                                        hellip

                                                                                        Dispatch

                                                                                        Task Queue

                                                                                        Points to

                                                                                        hellip

                                                                                        ScanTimeList

                                                                                        SwathGranuleMeta

                                                                                        Reprojection Data

                                                                                        Storage

                                                                                        Each entity specifies a

                                                                                        single reprojection job

                                                                                        request

                                                                                        Each entity specifies a

                                                                                        single reprojection task (ie

                                                                                        a single tile)

                                                                                        Query this table to get

                                                                                        geo-metadata (eg

                                                                                        boundaries) for each swath

                                                                                        tile

                                                                                        Query this table to get the

                                                                                        list of satellite scan times

                                                                                        that cover a target tile

                                                                                        Swath Source

                                                                                        Data Storage

                                                                                        bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                        bull Storage costs driven by data scale and 6 month project duration

                                                                                        bull Small with respect to the people costs even at graduate student rates

                                                                                        Reduction 1 Queue

                                                                                        Source Metadata

                                                                                        Request Queue

                                                                                        Scientific Results Download

                                                                                        Data Collection Stage

                                                                                        Source Imagery Download Sites

                                                                                        Reprojection Queue

                                                                                        Reduction 2 Queue

                                                                                        Download

                                                                                        Queue

                                                                                        Scientists

                                                                                        Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                        400-500 GB

                                                                                        60K files

                                                                                        10 MBsec

                                                                                        11 hours

                                                                                        lt10 workers

                                                                                        $50 upload

                                                                                        $450 storage

                                                                                        400 GB

                                                                                        45K files

                                                                                        3500 hours

                                                                                        20-100

                                                                                        workers

                                                                                        5-7 GB

                                                                                        55K files

                                                                                        1800 hours

                                                                                        20-100

                                                                                        workers

                                                                                        lt10 GB

                                                                                        ~1K files

                                                                                        1800 hours

                                                                                        20-100

                                                                                        workers

                                                                                        $420 cpu

                                                                                        $60 download

                                                                                        $216 cpu

                                                                                        $1 download

                                                                                        $6 storage

                                                                                        $216 cpu

                                                                                        $2 download

                                                                                        $9 storage

                                                                                        AzureMODIS

                                                                                        Service Web Role Portal

                                                                                        Total $1420

                                                                                        bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                        potential to be important to both large and small scale science problems

                                                                                        bull Equally import they can increase participation in research providing needed

                                                                                        resources to userscommunities without ready access

                                                                                        bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                        support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                        latency applications do not perform optimally on clouds today

                                                                                        bull Provide valuable fault tolerance and scalability abstractions

                                                                                        bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                        bull Clouds services to support research provide considerable leverage for both

                                                                                        individual researchers and entire communities of researchers

                                                                                        • Day 2 - Azure as a PaaS
                                                                                        • Day 2 - Applications

                                                                                          45

                                                                                          C1

                                                                                          C2

                                                                                          Removing Poison Messages

                                                                                          3 4 0

                                                                                          Producers Consumers

                                                                                          P2

                                                                                          P1

                                                                                          1 1

                                                                                          2 1

                                                                                          2 GetMessage(Q 30 s) msg 2 3 C2 consumed msg 2 4 DeleteMessage(Q msg 2) 7 GetMessage(Q 30 s) msg 1

                                                                                          1 GetMessage(Q 30 s) msg 1 5 C1 crashed

                                                                                          1 1

                                                                                          2 1

                                                                                          6 msg1 visible 30 s after Dequeue 3 0

                                                                                          1 2

                                                                                          1 1

                                                                                          1 2

                                                                                          46

                                                                                          C1

                                                                                          C2

                                                                                          Removing Poison Messages

                                                                                          3 4 0

                                                                                          Producers Consumers

                                                                                          P2

                                                                                          P1

                                                                                          1 2

                                                                                          2 Dequeue(Q 30 sec) msg 2 3 C2 consumed msg 2 4 Delete(Q msg 2) 7 Dequeue(Q 30 sec) msg 1 8 C2 crashed

                                                                                          1 Dequeue(Q 30 sec) msg 1 5 C1 crashed 10 C1 restarted 11 Dequeue(Q 30 sec) msg 1 12 DequeueCount gt 2 13 Delete (Q msg1) 1

                                                                                          2

                                                                                          6 msg1 visible 30s after Dequeue 9 msg1 visible 30s after Dequeue

                                                                                          3 0

                                                                                          1 3

                                                                                          1 2

                                                                                          1 3

                                                                                          Queues Recap

                                                                                          bullNo need to deal with failures Make message

                                                                                          processing idempotent

                                                                                          bullInvisible messages result in out of order Do not rely on order

                                                                                          bullEnforce threshold on messagersquos dequeue count Use Dequeue count to remove

                                                                                          poison messages

                                                                                          bullMessages gt 8KB

                                                                                          bullBatch messages

                                                                                          bullGarbage collect orphaned blobs

                                                                                          bullDynamically increasereduce workers

                                                                                          Use blob to store message data with

                                                                                          reference in message

                                                                                          Use message count to scale

                                                                                          bullNo need to deal with failures

                                                                                          bullInvisible messages result in out of order

                                                                                          bullEnforce threshold on messagersquos dequeue count

                                                                                          bullDynamically increasereduce workers

                                                                                          Windows Azure Storage Takeaways

                                                                                          Blobs

                                                                                          Drives

                                                                                          Tables

                                                                                          Queues

                                                                                          httpblogsmsdncomwindowsazurestorage

                                                                                          httpazurescopecloudappnet

                                                                                          49

                                                                                          A Quick Exercise

                                                                                          hellipThen letrsquos look at some code and some tools

                                                                                          50

                                                                                          Code ndash AccountInformationcs public class AccountInformation private static string storageKey = ldquotHiSiSnOtMyKeY private static string accountName = jjstore private static StorageCredentialsAccountAndKey credentials internal static StorageCredentialsAccountAndKey Credentials get if (credentials == null) credentials = new StorageCredentialsAccountAndKey(accountName storageKey) return credentials

                                                                                          51

                                                                                          Code ndash BlobHelpercs public class BlobHelper private static string defaultContainerName = school private CloudBlobClient client = null private CloudBlobContainer container = null private void InitContainer() if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudBlobClient() container = clientGetContainerReference(defaultContainerName) containerCreateIfNotExist() BlobContainerPermissions permissions = containerGetPermissions() permissionsPublicAccess = BlobContainerPublicAccessTypeContainer containerSetPermissions(permissions)

                                                                                          52

                                                                                          Code ndash BlobHelpercs

                                                                                          public void WriteFileToBlob(string filePath) if (client == null || container == null) InitContainer() FileInfo file = new FileInfo(filePath) CloudBlob blob = containerGetBlobReference(fileName) blobPropertiesContentType = GetContentType(fileExtension) blobUploadFile(fileFullName) Or if you want to write a string replace the last line with blobUploadText(someString) And make sure you set the content type to the appropriate MIME type (eg ldquotextplainrdquo)

                                                                                          53

                                                                                          Code ndash BlobHelpercs

                                                                                          public string GetBlobText(string blobName) if (client == null || container == null) InitContainer() CloudBlob blob = containerGetBlobReference(blobName) try return blobDownloadText() catch (Exception) The blob probably does not exist or there is no connection available return null

                                                                                          54

                                                                                          Application Code - Blobs private void SaveToCloudButton_Click(object sender RoutedEventArgs e) StringBuilder buff = new StringBuilder() buffAppendLine(LastNameFirstNameEmailBirthdayNativeLanguageFavoriteIceCreamYearsInPhDGraduated) foreach (AttendeeEntity attendee in attendees) buffAppendLine(attendeeToCsvString()) blobHelperWriteStringToBlob(SummerSchoolAttendeestxt buffToString())

                                                                                          The blob is now available at httpltAccountNamegtblobcorewindowsnetltContainerNamegtltBlobNamegt Or in this case httpjjstoreblobcorewindowsnetschoolSummerSchoolAttendeestxt

                                                                                          55

                                                                                          Code - TableEntities using MicrosoftWindowsAzureStorageClient public class AttendeeEntity TableServiceEntity public string FirstName get set public string LastName get set public string Email get set public DateTime Birthday get set public string FavoriteIceCream get set public int YearsInPhD get set public bool Graduated get set hellip

                                                                                          56

                                                                                          Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

                                                                                          57

                                                                                          Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

                                                                                          58

                                                                                          Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

                                                                                          59

                                                                                          Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

                                                                                          60

                                                                                          Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

                                                                                          Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

                                                                                          61

                                                                                          Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

                                                                                          62

                                                                                          Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

                                                                                          Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

                                                                                          63

                                                                                          Tools ndash Fiddler2

                                                                                          Best Practices

                                                                                          Picking the Right VM Size

                                                                                          bull Having the correct VM size can make a big difference in costs

                                                                                          bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                                                                                          bull If you scale better than linear across cores larger VMs could save you money

                                                                                          bull Pretty rare to see linear scaling across 8 cores

                                                                                          bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                                                                                          bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                                                                                          Using Your VM to the Maximum

                                                                                          Remember bull 1 role instance == 1 VM running Windows

                                                                                          bull 1 role instance = one specific task for your code

                                                                                          bull Yoursquore paying for the entire VM so why not use it

                                                                                          bull Common mistake ndash split up code into multiple roles each not using up CPU

                                                                                          bull Balance between using up CPU vs having free capacity in times of need

                                                                                          bull Multiple ways to use your CPU to the fullest

                                                                                          Exploiting Concurrency

                                                                                          bull Spin up additional processes each with a specific task or as a unit of concurrency

                                                                                          bull May not be ideal if number of active processes exceeds number of cores

                                                                                          bull Use multithreading aggressively

                                                                                          bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                                                                                          bull In NET 4 use the Task Parallel Library

                                                                                          bull Data parallelism

                                                                                          bull Task parallelism

                                                                                          Finding Good Code Neighbors

                                                                                          bull Typically code falls into one or more of these categories

                                                                                          bull Find code that is intensive with different resources to live together

                                                                                          bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                                                                                          Memory Intensive

                                                                                          CPU Intensive

                                                                                          Network IO Intensive

                                                                                          Storage IO Intensive

                                                                                          Scaling Appropriately

                                                                                          bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                                                                                          bull Spinning VMs up and down automatically is good at large scale

                                                                                          bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                                                                                          bull Being too aggressive in spinning down VMs can result in poor user experience

                                                                                          bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                                                                                          Performance Cost

                                                                                          Storage Costs

                                                                                          bull Understand an applicationrsquos storage profile and how storage billing works

                                                                                          bull Make service choices based on your app profile

                                                                                          bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                                                                                          bull Service choice can make a big cost difference based on your app profile

                                                                                          bull Caching and compressing They help a lot with storage costs

                                                                                          Saving Bandwidth Costs

                                                                                          Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                                                                          Sending fewer things over the wire often means getting fewer things from storage

                                                                                          Saving bandwidth costs often lead to savings in other places

                                                                                          Sending fewer things means your VM has time to do other tasks

                                                                                          All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                                                                          Compressing Content

                                                                                          1 Gzip all output content

                                                                                          bull All modern browsers can decompress on the fly

                                                                                          bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                                                                          2Tradeoff compute costs for storage size

                                                                                          3Minimize image sizes

                                                                                          bull Use Portable Network Graphics (PNGs)

                                                                                          bull Crush your PNGs

                                                                                          bull Strip needless metadata

                                                                                          bull Make all PNGs palette PNGs

                                                                                          Uncompressed

                                                                                          Content

                                                                                          Compressed

                                                                                          Content

                                                                                          Gzip

                                                                                          Minify JavaScript

                                                                                          Minify CCS

                                                                                          Minify Images

                                                                                          Best Practices Summary

                                                                                          Doing lsquolessrsquo is the key to saving costs

                                                                                          Measure everything

                                                                                          Know your application profile in and out

                                                                                          Research Examples in the Cloud

                                                                                          hellipon another set of slides

                                                                                          Map Reduce on Azure

                                                                                          bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                                                                          bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                                                                          76

                                                                                          Project Daytona - Map Reduce on Azure

                                                                                          httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                                                                          77

                                                                                          Questions and Discussionhellip

                                                                                          Thank you for hosting me at the Summer School

                                                                                          BLAST (Basic Local Alignment Search Tool)

                                                                                          bull The most important software in bioinformatics

                                                                                          bull Identify similarity between bio-sequences

                                                                                          Computationally intensive

                                                                                          bull Large number of pairwise alignment operations

                                                                                          bull A BLAST running can take 700 ~ 1000 CPU hours

                                                                                          bull Sequence databases growing exponentially

                                                                                          bull GenBank doubled in size in about 15 months

                                                                                          It is easy to parallelize BLAST

                                                                                          bull Segment the input

                                                                                          bull Segment processing (querying) is pleasingly parallel

                                                                                          bull Segment the database (eg mpiBLAST)

                                                                                          bull Needs special result reduction processing

                                                                                          Large volume data

                                                                                          bull A normal Blast database can be as large as 10GB

                                                                                          bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                                                          bull The output of BLAST is usually 10-100x larger than the input

                                                                                          bull Parallel BLAST engine on Azure

                                                                                          bull Query-segmentation data-parallel pattern bull split the input sequences

                                                                                          bull query partitions in parallel

                                                                                          bull merge results together when done

                                                                                          bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                                                          bull With three special considerations bull Batch job management

                                                                                          bull Task parallelism on an elastic Cloud

                                                                                          Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                                                          on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                                                          A simple SplitJoin pattern

                                                                                          Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                                                          bull 1248 for small middle large and extra large instance size

                                                                                          Task granularity bull Large partition load imbalance

                                                                                          bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                                                          bull Data transferring overhead

                                                                                          Best Practice test runs to profiling and set size to mitigate the overhead

                                                                                          Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                                                          bull too small repeated computation

                                                                                          bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                                                          bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                                                          bull Watch out for the 2-hour maximum limitation

                                                                                          BLAST task

                                                                                          Splitting task

                                                                                          BLAST task

                                                                                          BLAST task

                                                                                          BLAST task

                                                                                          hellip

                                                                                          Merging Task

                                                                                          Task size vs Performance

                                                                                          bull Benefit of the warm cache effect

                                                                                          bull 100 sequences per partition is the best choice

                                                                                          Instance size vs Performance

                                                                                          bull Super-linear speedup with larger size worker instances

                                                                                          bull Primarily due to the memory capability

                                                                                          Task SizeInstance Size vs Cost

                                                                                          bull Extra-large instance generated the best and the most economical throughput

                                                                                          bull Fully utilize the resource

                                                                                          Web

                                                                                          Portal

                                                                                          Web

                                                                                          Service

                                                                                          Job registration

                                                                                          Job Scheduler

                                                                                          Worker

                                                                                          Worker

                                                                                          Worker

                                                                                          Global

                                                                                          dispatch

                                                                                          queue

                                                                                          Web Role

                                                                                          Azure Table

                                                                                          Job Management Role

                                                                                          Azure Blob

                                                                                          Database

                                                                                          updating Role

                                                                                          hellip

                                                                                          Scaling Engine

                                                                                          Blast databases

                                                                                          temporary data etc)

                                                                                          Job Registry NCBI databases

                                                                                          BLAST task

                                                                                          Splitting task

                                                                                          BLAST task

                                                                                          BLAST task

                                                                                          BLAST task

                                                                                          hellip

                                                                                          Merging Task

                                                                                          ASPNET program hosted by a web role instance bull Submit jobs

                                                                                          bull Track jobrsquos status and logs

                                                                                          AuthenticationAuthorization based on Live ID

                                                                                          The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                                                          Web Portal

                                                                                          Web Service

                                                                                          Job registration

                                                                                          Job Scheduler

                                                                                          Job Portal

                                                                                          Scaling Engine

                                                                                          Job Registry

                                                                                          R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                                                          Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                                                          bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                                                          AzureBLAST significantly saved computing timehellip

                                                                                          Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                                                          ldquoAll against Allrdquo query bull The database is also the input query

                                                                                          bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                                                          bull Theoretically 100 billion sequence comparisons

                                                                                          Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                                                          bull Would require 3216731 minutes (61 years) on one desktop

                                                                                          One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                                                          bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                                          bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                                          bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                                          bull Each segment consists of smaller partitions

                                                                                          bull When load imbalances redistribute the load manually

                                                                                          5

                                                                                          0

                                                                                          62 6

                                                                                          2 6

                                                                                          2 6

                                                                                          2 6

                                                                                          2 5

                                                                                          0 62

                                                                                          bull Total size of the output result is ~230GB

                                                                                          bull The number of total hits is 1764579487

                                                                                          bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                                          bull But based our estimates real working instance time should be 6~8 day

                                                                                          bull Look into log data to analyze what took placehellip

                                                                                          5

                                                                                          0

                                                                                          62 6

                                                                                          2 6

                                                                                          2 6

                                                                                          2 6

                                                                                          2 5

                                                                                          0 62

                                                                                          A normal log record should be

                                                                                          Otherwise something is wrong (eg task failed to complete)

                                                                                          3312010 614 RD00155D3611B0 Executing the task 251523

                                                                                          3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                                          3312010 625 RD00155D3611B0 Executing the task 251553

                                                                                          3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                                          3312010 644 RD00155D3611B0 Executing the task 251600

                                                                                          3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                                          3312010 822 RD00155D3611B0 Executing the task 251774

                                                                                          3312010 950 RD00155D3611B0 Executing the task 251895

                                                                                          3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                                          North Europe Data Center totally 34256 tasks processed

                                                                                          All 62 compute nodes lost tasks

                                                                                          and then came back in a group

                                                                                          This is an Update domain

                                                                                          ~30 mins

                                                                                          ~ 6 nodes in one group

                                                                                          35 Nodes experience blob

                                                                                          writing failure at same

                                                                                          time

                                                                                          West Europe Datacenter 30976 tasks are completed and job was killed

                                                                                          A reasonable guess the

                                                                                          Fault Domain is working

                                                                                          MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                                          You never miss the water till the well has run dry Irish Proverb

                                                                                          ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                                          Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                                          λv = Latent heat of vaporization (Jg)

                                                                                          Rn = Net radiation (W m-2)

                                                                                          cp = Specific heat capacity of air (J kg-1 K-1)

                                                                                          ρa = dry air density (kg m-3)

                                                                                          δq = vapor pressure deficit (Pa)

                                                                                          ga = Conductivity of air (inverse of ra) (m s-1)

                                                                                          gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                                          γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                                          Estimating resistanceconductivity across a

                                                                                          catchment can be tricky

                                                                                          bull Lots of inputs big data reduction

                                                                                          bull Some of the inputs are not so simple

                                                                                          119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                                          (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                                          Penman-Monteith (1964)

                                                                                          Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                                          NASA MODIS

                                                                                          imagery source

                                                                                          archives

                                                                                          5 TB (600K files)

                                                                                          FLUXNET curated

                                                                                          sensor dataset

                                                                                          (30GB 960 files)

                                                                                          FLUXNET curated

                                                                                          field dataset

                                                                                          2 KB (1 file)

                                                                                          NCEPNCAR

                                                                                          ~100MB

                                                                                          (4K files)

                                                                                          Vegetative

                                                                                          clumping

                                                                                          ~5MB (1file)

                                                                                          Climate

                                                                                          classification

                                                                                          ~1MB (1file)

                                                                                          20 US year = 1 global year

                                                                                          Data collection (map) stage

                                                                                          bull Downloads requested input tiles

                                                                                          from NASA ftp sites

                                                                                          bull Includes geospatial lookup for

                                                                                          non-sinusoidal tiles that will

                                                                                          contribute to a reprojected

                                                                                          sinusoidal tile

                                                                                          Reprojection (map) stage

                                                                                          bull Converts source tile(s) to

                                                                                          intermediate result sinusoidal tiles

                                                                                          bull Simple nearest neighbor or spline

                                                                                          algorithms

                                                                                          Derivation reduction stage

                                                                                          bull First stage visible to scientist

                                                                                          bull Computes ET in our initial use

                                                                                          Analysis reduction stage

                                                                                          bull Optional second stage visible to

                                                                                          scientist

                                                                                          bull Enables production of science

                                                                                          analysis artifacts such as maps

                                                                                          tables virtual sensors

                                                                                          Reduction 1

                                                                                          Queue

                                                                                          Source

                                                                                          Metadata

                                                                                          AzureMODIS

                                                                                          Service Web Role Portal

                                                                                          Request

                                                                                          Queue

                                                                                          Scientific

                                                                                          Results

                                                                                          Download

                                                                                          Data Collection Stage

                                                                                          Source Imagery Download Sites

                                                                                          Reprojection

                                                                                          Queue

                                                                                          Reduction 2

                                                                                          Queue

                                                                                          Download

                                                                                          Queue

                                                                                          Scientists

                                                                                          Science results

                                                                                          Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                          httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                          bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                          bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                          bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                          recoverable units of work

                                                                                          bull Execution status of all jobs and tasks persisted in Tables

                                                                                          ltPipelineStagegt

                                                                                          Request

                                                                                          hellip ltPipelineStagegtJobStatus

                                                                                          Persist ltPipelineStagegtJob Queue

                                                                                          MODISAzure Service

                                                                                          (Web Role)

                                                                                          Service Monitor

                                                                                          (Worker Role)

                                                                                          Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                          hellip

                                                                                          Dispatch

                                                                                          ltPipelineStagegtTask Queue

                                                                                          All work actually done by a Worker Role

                                                                                          bull Sandboxes science or other executable

                                                                                          bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                          Service Monitor

                                                                                          (Worker Role)

                                                                                          Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                          GenericWorker

                                                                                          (Worker Role)

                                                                                          hellip

                                                                                          hellip

                                                                                          Dispatch

                                                                                          ltPipelineStagegtTask Queue

                                                                                          hellip

                                                                                          ltInputgtData Storage

                                                                                          bull Dequeues tasks created by the Service Monitor

                                                                                          bull Retries failed tasks 3 times

                                                                                          bull Maintains all task status

                                                                                          Reprojection Request

                                                                                          hellip

                                                                                          Service Monitor

                                                                                          (Worker Role)

                                                                                          ReprojectionJobStatus Persist

                                                                                          Parse amp Persist ReprojectionTaskStatus

                                                                                          GenericWorker

                                                                                          (Worker Role)

                                                                                          hellip

                                                                                          Job Queue

                                                                                          hellip

                                                                                          Dispatch

                                                                                          Task Queue

                                                                                          Points to

                                                                                          hellip

                                                                                          ScanTimeList

                                                                                          SwathGranuleMeta

                                                                                          Reprojection Data

                                                                                          Storage

                                                                                          Each entity specifies a

                                                                                          single reprojection job

                                                                                          request

                                                                                          Each entity specifies a

                                                                                          single reprojection task (ie

                                                                                          a single tile)

                                                                                          Query this table to get

                                                                                          geo-metadata (eg

                                                                                          boundaries) for each swath

                                                                                          tile

                                                                                          Query this table to get the

                                                                                          list of satellite scan times

                                                                                          that cover a target tile

                                                                                          Swath Source

                                                                                          Data Storage

                                                                                          bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                          bull Storage costs driven by data scale and 6 month project duration

                                                                                          bull Small with respect to the people costs even at graduate student rates

                                                                                          Reduction 1 Queue

                                                                                          Source Metadata

                                                                                          Request Queue

                                                                                          Scientific Results Download

                                                                                          Data Collection Stage

                                                                                          Source Imagery Download Sites

                                                                                          Reprojection Queue

                                                                                          Reduction 2 Queue

                                                                                          Download

                                                                                          Queue

                                                                                          Scientists

                                                                                          Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                          400-500 GB

                                                                                          60K files

                                                                                          10 MBsec

                                                                                          11 hours

                                                                                          lt10 workers

                                                                                          $50 upload

                                                                                          $450 storage

                                                                                          400 GB

                                                                                          45K files

                                                                                          3500 hours

                                                                                          20-100

                                                                                          workers

                                                                                          5-7 GB

                                                                                          55K files

                                                                                          1800 hours

                                                                                          20-100

                                                                                          workers

                                                                                          lt10 GB

                                                                                          ~1K files

                                                                                          1800 hours

                                                                                          20-100

                                                                                          workers

                                                                                          $420 cpu

                                                                                          $60 download

                                                                                          $216 cpu

                                                                                          $1 download

                                                                                          $6 storage

                                                                                          $216 cpu

                                                                                          $2 download

                                                                                          $9 storage

                                                                                          AzureMODIS

                                                                                          Service Web Role Portal

                                                                                          Total $1420

                                                                                          bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                          potential to be important to both large and small scale science problems

                                                                                          bull Equally import they can increase participation in research providing needed

                                                                                          resources to userscommunities without ready access

                                                                                          bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                          support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                          latency applications do not perform optimally on clouds today

                                                                                          bull Provide valuable fault tolerance and scalability abstractions

                                                                                          bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                          bull Clouds services to support research provide considerable leverage for both

                                                                                          individual researchers and entire communities of researchers

                                                                                          • Day 2 - Azure as a PaaS
                                                                                          • Day 2 - Applications

                                                                                            46

                                                                                            C1

                                                                                            C2

                                                                                            Removing Poison Messages

                                                                                            3 4 0

                                                                                            Producers Consumers

                                                                                            P2

                                                                                            P1

                                                                                            1 2

                                                                                            2 Dequeue(Q 30 sec) msg 2 3 C2 consumed msg 2 4 Delete(Q msg 2) 7 Dequeue(Q 30 sec) msg 1 8 C2 crashed

                                                                                            1 Dequeue(Q 30 sec) msg 1 5 C1 crashed 10 C1 restarted 11 Dequeue(Q 30 sec) msg 1 12 DequeueCount gt 2 13 Delete (Q msg1) 1

                                                                                            2

                                                                                            6 msg1 visible 30s after Dequeue 9 msg1 visible 30s after Dequeue

                                                                                            3 0

                                                                                            1 3

                                                                                            1 2

                                                                                            1 3

                                                                                            Queues Recap

                                                                                            bullNo need to deal with failures Make message

                                                                                            processing idempotent

                                                                                            bullInvisible messages result in out of order Do not rely on order

                                                                                            bullEnforce threshold on messagersquos dequeue count Use Dequeue count to remove

                                                                                            poison messages

                                                                                            bullMessages gt 8KB

                                                                                            bullBatch messages

                                                                                            bullGarbage collect orphaned blobs

                                                                                            bullDynamically increasereduce workers

                                                                                            Use blob to store message data with

                                                                                            reference in message

                                                                                            Use message count to scale

                                                                                            bullNo need to deal with failures

                                                                                            bullInvisible messages result in out of order

                                                                                            bullEnforce threshold on messagersquos dequeue count

                                                                                            bullDynamically increasereduce workers

                                                                                            Windows Azure Storage Takeaways

                                                                                            Blobs

                                                                                            Drives

                                                                                            Tables

                                                                                            Queues

                                                                                            httpblogsmsdncomwindowsazurestorage

                                                                                            httpazurescopecloudappnet

                                                                                            49

                                                                                            A Quick Exercise

                                                                                            hellipThen letrsquos look at some code and some tools

                                                                                            50

                                                                                            Code ndash AccountInformationcs public class AccountInformation private static string storageKey = ldquotHiSiSnOtMyKeY private static string accountName = jjstore private static StorageCredentialsAccountAndKey credentials internal static StorageCredentialsAccountAndKey Credentials get if (credentials == null) credentials = new StorageCredentialsAccountAndKey(accountName storageKey) return credentials

                                                                                            51

                                                                                            Code ndash BlobHelpercs public class BlobHelper private static string defaultContainerName = school private CloudBlobClient client = null private CloudBlobContainer container = null private void InitContainer() if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudBlobClient() container = clientGetContainerReference(defaultContainerName) containerCreateIfNotExist() BlobContainerPermissions permissions = containerGetPermissions() permissionsPublicAccess = BlobContainerPublicAccessTypeContainer containerSetPermissions(permissions)

                                                                                            52

                                                                                            Code ndash BlobHelpercs

                                                                                            public void WriteFileToBlob(string filePath) if (client == null || container == null) InitContainer() FileInfo file = new FileInfo(filePath) CloudBlob blob = containerGetBlobReference(fileName) blobPropertiesContentType = GetContentType(fileExtension) blobUploadFile(fileFullName) Or if you want to write a string replace the last line with blobUploadText(someString) And make sure you set the content type to the appropriate MIME type (eg ldquotextplainrdquo)

                                                                                            53

                                                                                            Code ndash BlobHelpercs

                                                                                            public string GetBlobText(string blobName) if (client == null || container == null) InitContainer() CloudBlob blob = containerGetBlobReference(blobName) try return blobDownloadText() catch (Exception) The blob probably does not exist or there is no connection available return null

                                                                                            54

                                                                                            Application Code - Blobs private void SaveToCloudButton_Click(object sender RoutedEventArgs e) StringBuilder buff = new StringBuilder() buffAppendLine(LastNameFirstNameEmailBirthdayNativeLanguageFavoriteIceCreamYearsInPhDGraduated) foreach (AttendeeEntity attendee in attendees) buffAppendLine(attendeeToCsvString()) blobHelperWriteStringToBlob(SummerSchoolAttendeestxt buffToString())

                                                                                            The blob is now available at httpltAccountNamegtblobcorewindowsnetltContainerNamegtltBlobNamegt Or in this case httpjjstoreblobcorewindowsnetschoolSummerSchoolAttendeestxt

                                                                                            55

                                                                                            Code - TableEntities using MicrosoftWindowsAzureStorageClient public class AttendeeEntity TableServiceEntity public string FirstName get set public string LastName get set public string Email get set public DateTime Birthday get set public string FavoriteIceCream get set public int YearsInPhD get set public bool Graduated get set hellip

                                                                                            56

                                                                                            Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

                                                                                            57

                                                                                            Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

                                                                                            58

                                                                                            Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

                                                                                            59

                                                                                            Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

                                                                                            60

                                                                                            Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

                                                                                            Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

                                                                                            61

                                                                                            Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

                                                                                            62

                                                                                            Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

                                                                                            Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

                                                                                            63

                                                                                            Tools ndash Fiddler2

                                                                                            Best Practices

                                                                                            Picking the Right VM Size

                                                                                            bull Having the correct VM size can make a big difference in costs

                                                                                            bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                                                                                            bull If you scale better than linear across cores larger VMs could save you money

                                                                                            bull Pretty rare to see linear scaling across 8 cores

                                                                                            bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                                                                                            bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                                                                                            Using Your VM to the Maximum

                                                                                            Remember bull 1 role instance == 1 VM running Windows

                                                                                            bull 1 role instance = one specific task for your code

                                                                                            bull Yoursquore paying for the entire VM so why not use it

                                                                                            bull Common mistake ndash split up code into multiple roles each not using up CPU

                                                                                            bull Balance between using up CPU vs having free capacity in times of need

                                                                                            bull Multiple ways to use your CPU to the fullest

                                                                                            Exploiting Concurrency

                                                                                            bull Spin up additional processes each with a specific task or as a unit of concurrency

                                                                                            bull May not be ideal if number of active processes exceeds number of cores

                                                                                            bull Use multithreading aggressively

                                                                                            bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                                                                                            bull In NET 4 use the Task Parallel Library

                                                                                            bull Data parallelism

                                                                                            bull Task parallelism

                                                                                            Finding Good Code Neighbors

                                                                                            bull Typically code falls into one or more of these categories

                                                                                            bull Find code that is intensive with different resources to live together

                                                                                            bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                                                                                            Memory Intensive

                                                                                            CPU Intensive

                                                                                            Network IO Intensive

                                                                                            Storage IO Intensive

                                                                                            Scaling Appropriately

                                                                                            bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                                                                                            bull Spinning VMs up and down automatically is good at large scale

                                                                                            bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                                                                                            bull Being too aggressive in spinning down VMs can result in poor user experience

                                                                                            bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                                                                                            Performance Cost

                                                                                            Storage Costs

                                                                                            bull Understand an applicationrsquos storage profile and how storage billing works

                                                                                            bull Make service choices based on your app profile

                                                                                            bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                                                                                            bull Service choice can make a big cost difference based on your app profile

                                                                                            bull Caching and compressing They help a lot with storage costs

                                                                                            Saving Bandwidth Costs

                                                                                            Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                                                                            Sending fewer things over the wire often means getting fewer things from storage

                                                                                            Saving bandwidth costs often lead to savings in other places

                                                                                            Sending fewer things means your VM has time to do other tasks

                                                                                            All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                                                                            Compressing Content

                                                                                            1 Gzip all output content

                                                                                            bull All modern browsers can decompress on the fly

                                                                                            bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                                                                            2Tradeoff compute costs for storage size

                                                                                            3Minimize image sizes

                                                                                            bull Use Portable Network Graphics (PNGs)

                                                                                            bull Crush your PNGs

                                                                                            bull Strip needless metadata

                                                                                            bull Make all PNGs palette PNGs

                                                                                            Uncompressed

                                                                                            Content

                                                                                            Compressed

                                                                                            Content

                                                                                            Gzip

                                                                                            Minify JavaScript

                                                                                            Minify CCS

                                                                                            Minify Images

                                                                                            Best Practices Summary

                                                                                            Doing lsquolessrsquo is the key to saving costs

                                                                                            Measure everything

                                                                                            Know your application profile in and out

                                                                                            Research Examples in the Cloud

                                                                                            hellipon another set of slides

                                                                                            Map Reduce on Azure

                                                                                            bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                                                                            bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                                                                            76

                                                                                            Project Daytona - Map Reduce on Azure

                                                                                            httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                                                                            77

                                                                                            Questions and Discussionhellip

                                                                                            Thank you for hosting me at the Summer School

                                                                                            BLAST (Basic Local Alignment Search Tool)

                                                                                            bull The most important software in bioinformatics

                                                                                            bull Identify similarity between bio-sequences

                                                                                            Computationally intensive

                                                                                            bull Large number of pairwise alignment operations

                                                                                            bull A BLAST running can take 700 ~ 1000 CPU hours

                                                                                            bull Sequence databases growing exponentially

                                                                                            bull GenBank doubled in size in about 15 months

                                                                                            It is easy to parallelize BLAST

                                                                                            bull Segment the input

                                                                                            bull Segment processing (querying) is pleasingly parallel

                                                                                            bull Segment the database (eg mpiBLAST)

                                                                                            bull Needs special result reduction processing

                                                                                            Large volume data

                                                                                            bull A normal Blast database can be as large as 10GB

                                                                                            bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                                                            bull The output of BLAST is usually 10-100x larger than the input

                                                                                            bull Parallel BLAST engine on Azure

                                                                                            bull Query-segmentation data-parallel pattern bull split the input sequences

                                                                                            bull query partitions in parallel

                                                                                            bull merge results together when done

                                                                                            bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                                                            bull With three special considerations bull Batch job management

                                                                                            bull Task parallelism on an elastic Cloud

                                                                                            Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                                                            on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                                                            A simple SplitJoin pattern

                                                                                            Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                                                            bull 1248 for small middle large and extra large instance size

                                                                                            Task granularity bull Large partition load imbalance

                                                                                            bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                                                            bull Data transferring overhead

                                                                                            Best Practice test runs to profiling and set size to mitigate the overhead

                                                                                            Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                                                            bull too small repeated computation

                                                                                            bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                                                            bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                                                            bull Watch out for the 2-hour maximum limitation

                                                                                            BLAST task

                                                                                            Splitting task

                                                                                            BLAST task

                                                                                            BLAST task

                                                                                            BLAST task

                                                                                            hellip

                                                                                            Merging Task

                                                                                            Task size vs Performance

                                                                                            bull Benefit of the warm cache effect

                                                                                            bull 100 sequences per partition is the best choice

                                                                                            Instance size vs Performance

                                                                                            bull Super-linear speedup with larger size worker instances

                                                                                            bull Primarily due to the memory capability

                                                                                            Task SizeInstance Size vs Cost

                                                                                            bull Extra-large instance generated the best and the most economical throughput

                                                                                            bull Fully utilize the resource

                                                                                            Web

                                                                                            Portal

                                                                                            Web

                                                                                            Service

                                                                                            Job registration

                                                                                            Job Scheduler

                                                                                            Worker

                                                                                            Worker

                                                                                            Worker

                                                                                            Global

                                                                                            dispatch

                                                                                            queue

                                                                                            Web Role

                                                                                            Azure Table

                                                                                            Job Management Role

                                                                                            Azure Blob

                                                                                            Database

                                                                                            updating Role

                                                                                            hellip

                                                                                            Scaling Engine

                                                                                            Blast databases

                                                                                            temporary data etc)

                                                                                            Job Registry NCBI databases

                                                                                            BLAST task

                                                                                            Splitting task

                                                                                            BLAST task

                                                                                            BLAST task

                                                                                            BLAST task

                                                                                            hellip

                                                                                            Merging Task

                                                                                            ASPNET program hosted by a web role instance bull Submit jobs

                                                                                            bull Track jobrsquos status and logs

                                                                                            AuthenticationAuthorization based on Live ID

                                                                                            The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                                                            Web Portal

                                                                                            Web Service

                                                                                            Job registration

                                                                                            Job Scheduler

                                                                                            Job Portal

                                                                                            Scaling Engine

                                                                                            Job Registry

                                                                                            R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                                                            Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                                                            bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                                                            AzureBLAST significantly saved computing timehellip

                                                                                            Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                                                            ldquoAll against Allrdquo query bull The database is also the input query

                                                                                            bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                                                            bull Theoretically 100 billion sequence comparisons

                                                                                            Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                                                            bull Would require 3216731 minutes (61 years) on one desktop

                                                                                            One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                                                            bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                                            bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                                            bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                                            bull Each segment consists of smaller partitions

                                                                                            bull When load imbalances redistribute the load manually

                                                                                            5

                                                                                            0

                                                                                            62 6

                                                                                            2 6

                                                                                            2 6

                                                                                            2 6

                                                                                            2 5

                                                                                            0 62

                                                                                            bull Total size of the output result is ~230GB

                                                                                            bull The number of total hits is 1764579487

                                                                                            bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                                            bull But based our estimates real working instance time should be 6~8 day

                                                                                            bull Look into log data to analyze what took placehellip

                                                                                            5

                                                                                            0

                                                                                            62 6

                                                                                            2 6

                                                                                            2 6

                                                                                            2 6

                                                                                            2 5

                                                                                            0 62

                                                                                            A normal log record should be

                                                                                            Otherwise something is wrong (eg task failed to complete)

                                                                                            3312010 614 RD00155D3611B0 Executing the task 251523

                                                                                            3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                                            3312010 625 RD00155D3611B0 Executing the task 251553

                                                                                            3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                                            3312010 644 RD00155D3611B0 Executing the task 251600

                                                                                            3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                                            3312010 822 RD00155D3611B0 Executing the task 251774

                                                                                            3312010 950 RD00155D3611B0 Executing the task 251895

                                                                                            3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                                            North Europe Data Center totally 34256 tasks processed

                                                                                            All 62 compute nodes lost tasks

                                                                                            and then came back in a group

                                                                                            This is an Update domain

                                                                                            ~30 mins

                                                                                            ~ 6 nodes in one group

                                                                                            35 Nodes experience blob

                                                                                            writing failure at same

                                                                                            time

                                                                                            West Europe Datacenter 30976 tasks are completed and job was killed

                                                                                            A reasonable guess the

                                                                                            Fault Domain is working

                                                                                            MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                                            You never miss the water till the well has run dry Irish Proverb

                                                                                            ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                                            Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                                            λv = Latent heat of vaporization (Jg)

                                                                                            Rn = Net radiation (W m-2)

                                                                                            cp = Specific heat capacity of air (J kg-1 K-1)

                                                                                            ρa = dry air density (kg m-3)

                                                                                            δq = vapor pressure deficit (Pa)

                                                                                            ga = Conductivity of air (inverse of ra) (m s-1)

                                                                                            gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                                            γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                                            Estimating resistanceconductivity across a

                                                                                            catchment can be tricky

                                                                                            bull Lots of inputs big data reduction

                                                                                            bull Some of the inputs are not so simple

                                                                                            119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                                            (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                                            Penman-Monteith (1964)

                                                                                            Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                                            NASA MODIS

                                                                                            imagery source

                                                                                            archives

                                                                                            5 TB (600K files)

                                                                                            FLUXNET curated

                                                                                            sensor dataset

                                                                                            (30GB 960 files)

                                                                                            FLUXNET curated

                                                                                            field dataset

                                                                                            2 KB (1 file)

                                                                                            NCEPNCAR

                                                                                            ~100MB

                                                                                            (4K files)

                                                                                            Vegetative

                                                                                            clumping

                                                                                            ~5MB (1file)

                                                                                            Climate

                                                                                            classification

                                                                                            ~1MB (1file)

                                                                                            20 US year = 1 global year

                                                                                            Data collection (map) stage

                                                                                            bull Downloads requested input tiles

                                                                                            from NASA ftp sites

                                                                                            bull Includes geospatial lookup for

                                                                                            non-sinusoidal tiles that will

                                                                                            contribute to a reprojected

                                                                                            sinusoidal tile

                                                                                            Reprojection (map) stage

                                                                                            bull Converts source tile(s) to

                                                                                            intermediate result sinusoidal tiles

                                                                                            bull Simple nearest neighbor or spline

                                                                                            algorithms

                                                                                            Derivation reduction stage

                                                                                            bull First stage visible to scientist

                                                                                            bull Computes ET in our initial use

                                                                                            Analysis reduction stage

                                                                                            bull Optional second stage visible to

                                                                                            scientist

                                                                                            bull Enables production of science

                                                                                            analysis artifacts such as maps

                                                                                            tables virtual sensors

                                                                                            Reduction 1

                                                                                            Queue

                                                                                            Source

                                                                                            Metadata

                                                                                            AzureMODIS

                                                                                            Service Web Role Portal

                                                                                            Request

                                                                                            Queue

                                                                                            Scientific

                                                                                            Results

                                                                                            Download

                                                                                            Data Collection Stage

                                                                                            Source Imagery Download Sites

                                                                                            Reprojection

                                                                                            Queue

                                                                                            Reduction 2

                                                                                            Queue

                                                                                            Download

                                                                                            Queue

                                                                                            Scientists

                                                                                            Science results

                                                                                            Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                            httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                            bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                            bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                            bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                            recoverable units of work

                                                                                            bull Execution status of all jobs and tasks persisted in Tables

                                                                                            ltPipelineStagegt

                                                                                            Request

                                                                                            hellip ltPipelineStagegtJobStatus

                                                                                            Persist ltPipelineStagegtJob Queue

                                                                                            MODISAzure Service

                                                                                            (Web Role)

                                                                                            Service Monitor

                                                                                            (Worker Role)

                                                                                            Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                            hellip

                                                                                            Dispatch

                                                                                            ltPipelineStagegtTask Queue

                                                                                            All work actually done by a Worker Role

                                                                                            bull Sandboxes science or other executable

                                                                                            bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                            Service Monitor

                                                                                            (Worker Role)

                                                                                            Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                            GenericWorker

                                                                                            (Worker Role)

                                                                                            hellip

                                                                                            hellip

                                                                                            Dispatch

                                                                                            ltPipelineStagegtTask Queue

                                                                                            hellip

                                                                                            ltInputgtData Storage

                                                                                            bull Dequeues tasks created by the Service Monitor

                                                                                            bull Retries failed tasks 3 times

                                                                                            bull Maintains all task status

                                                                                            Reprojection Request

                                                                                            hellip

                                                                                            Service Monitor

                                                                                            (Worker Role)

                                                                                            ReprojectionJobStatus Persist

                                                                                            Parse amp Persist ReprojectionTaskStatus

                                                                                            GenericWorker

                                                                                            (Worker Role)

                                                                                            hellip

                                                                                            Job Queue

                                                                                            hellip

                                                                                            Dispatch

                                                                                            Task Queue

                                                                                            Points to

                                                                                            hellip

                                                                                            ScanTimeList

                                                                                            SwathGranuleMeta

                                                                                            Reprojection Data

                                                                                            Storage

                                                                                            Each entity specifies a

                                                                                            single reprojection job

                                                                                            request

                                                                                            Each entity specifies a

                                                                                            single reprojection task (ie

                                                                                            a single tile)

                                                                                            Query this table to get

                                                                                            geo-metadata (eg

                                                                                            boundaries) for each swath

                                                                                            tile

                                                                                            Query this table to get the

                                                                                            list of satellite scan times

                                                                                            that cover a target tile

                                                                                            Swath Source

                                                                                            Data Storage

                                                                                            bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                            bull Storage costs driven by data scale and 6 month project duration

                                                                                            bull Small with respect to the people costs even at graduate student rates

                                                                                            Reduction 1 Queue

                                                                                            Source Metadata

                                                                                            Request Queue

                                                                                            Scientific Results Download

                                                                                            Data Collection Stage

                                                                                            Source Imagery Download Sites

                                                                                            Reprojection Queue

                                                                                            Reduction 2 Queue

                                                                                            Download

                                                                                            Queue

                                                                                            Scientists

                                                                                            Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                            400-500 GB

                                                                                            60K files

                                                                                            10 MBsec

                                                                                            11 hours

                                                                                            lt10 workers

                                                                                            $50 upload

                                                                                            $450 storage

                                                                                            400 GB

                                                                                            45K files

                                                                                            3500 hours

                                                                                            20-100

                                                                                            workers

                                                                                            5-7 GB

                                                                                            55K files

                                                                                            1800 hours

                                                                                            20-100

                                                                                            workers

                                                                                            lt10 GB

                                                                                            ~1K files

                                                                                            1800 hours

                                                                                            20-100

                                                                                            workers

                                                                                            $420 cpu

                                                                                            $60 download

                                                                                            $216 cpu

                                                                                            $1 download

                                                                                            $6 storage

                                                                                            $216 cpu

                                                                                            $2 download

                                                                                            $9 storage

                                                                                            AzureMODIS

                                                                                            Service Web Role Portal

                                                                                            Total $1420

                                                                                            bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                            potential to be important to both large and small scale science problems

                                                                                            bull Equally import they can increase participation in research providing needed

                                                                                            resources to userscommunities without ready access

                                                                                            bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                            support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                            latency applications do not perform optimally on clouds today

                                                                                            bull Provide valuable fault tolerance and scalability abstractions

                                                                                            bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                            bull Clouds services to support research provide considerable leverage for both

                                                                                            individual researchers and entire communities of researchers

                                                                                            • Day 2 - Azure as a PaaS
                                                                                            • Day 2 - Applications

                                                                                              Queues Recap

                                                                                              bullNo need to deal with failures Make message

                                                                                              processing idempotent

                                                                                              bullInvisible messages result in out of order Do not rely on order

                                                                                              bullEnforce threshold on messagersquos dequeue count Use Dequeue count to remove

                                                                                              poison messages

                                                                                              bullMessages gt 8KB

                                                                                              bullBatch messages

                                                                                              bullGarbage collect orphaned blobs

                                                                                              bullDynamically increasereduce workers

                                                                                              Use blob to store message data with

                                                                                              reference in message

                                                                                              Use message count to scale

                                                                                              bullNo need to deal with failures

                                                                                              bullInvisible messages result in out of order

                                                                                              bullEnforce threshold on messagersquos dequeue count

                                                                                              bullDynamically increasereduce workers

                                                                                              Windows Azure Storage Takeaways

                                                                                              Blobs

                                                                                              Drives

                                                                                              Tables

                                                                                              Queues

                                                                                              httpblogsmsdncomwindowsazurestorage

                                                                                              httpazurescopecloudappnet

                                                                                              49

                                                                                              A Quick Exercise

                                                                                              hellipThen letrsquos look at some code and some tools

                                                                                              50

                                                                                              Code ndash AccountInformationcs public class AccountInformation private static string storageKey = ldquotHiSiSnOtMyKeY private static string accountName = jjstore private static StorageCredentialsAccountAndKey credentials internal static StorageCredentialsAccountAndKey Credentials get if (credentials == null) credentials = new StorageCredentialsAccountAndKey(accountName storageKey) return credentials

                                                                                              51

                                                                                              Code ndash BlobHelpercs public class BlobHelper private static string defaultContainerName = school private CloudBlobClient client = null private CloudBlobContainer container = null private void InitContainer() if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudBlobClient() container = clientGetContainerReference(defaultContainerName) containerCreateIfNotExist() BlobContainerPermissions permissions = containerGetPermissions() permissionsPublicAccess = BlobContainerPublicAccessTypeContainer containerSetPermissions(permissions)

                                                                                              52

                                                                                              Code ndash BlobHelpercs

                                                                                              public void WriteFileToBlob(string filePath) if (client == null || container == null) InitContainer() FileInfo file = new FileInfo(filePath) CloudBlob blob = containerGetBlobReference(fileName) blobPropertiesContentType = GetContentType(fileExtension) blobUploadFile(fileFullName) Or if you want to write a string replace the last line with blobUploadText(someString) And make sure you set the content type to the appropriate MIME type (eg ldquotextplainrdquo)

                                                                                              53

                                                                                              Code ndash BlobHelpercs

                                                                                              public string GetBlobText(string blobName) if (client == null || container == null) InitContainer() CloudBlob blob = containerGetBlobReference(blobName) try return blobDownloadText() catch (Exception) The blob probably does not exist or there is no connection available return null

                                                                                              54

                                                                                              Application Code - Blobs private void SaveToCloudButton_Click(object sender RoutedEventArgs e) StringBuilder buff = new StringBuilder() buffAppendLine(LastNameFirstNameEmailBirthdayNativeLanguageFavoriteIceCreamYearsInPhDGraduated) foreach (AttendeeEntity attendee in attendees) buffAppendLine(attendeeToCsvString()) blobHelperWriteStringToBlob(SummerSchoolAttendeestxt buffToString())

                                                                                              The blob is now available at httpltAccountNamegtblobcorewindowsnetltContainerNamegtltBlobNamegt Or in this case httpjjstoreblobcorewindowsnetschoolSummerSchoolAttendeestxt

                                                                                              55

                                                                                              Code - TableEntities using MicrosoftWindowsAzureStorageClient public class AttendeeEntity TableServiceEntity public string FirstName get set public string LastName get set public string Email get set public DateTime Birthday get set public string FavoriteIceCream get set public int YearsInPhD get set public bool Graduated get set hellip

                                                                                              56

                                                                                              Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

                                                                                              57

                                                                                              Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

                                                                                              58

                                                                                              Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

                                                                                              59

                                                                                              Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

                                                                                              60

                                                                                              Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

                                                                                              Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

                                                                                              61

                                                                                              Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

                                                                                              62

                                                                                              Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

                                                                                              Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

                                                                                              63

                                                                                              Tools ndash Fiddler2

                                                                                              Best Practices

                                                                                              Picking the Right VM Size

                                                                                              bull Having the correct VM size can make a big difference in costs

                                                                                              bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                                                                                              bull If you scale better than linear across cores larger VMs could save you money

                                                                                              bull Pretty rare to see linear scaling across 8 cores

                                                                                              bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                                                                                              bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                                                                                              Using Your VM to the Maximum

                                                                                              Remember bull 1 role instance == 1 VM running Windows

                                                                                              bull 1 role instance = one specific task for your code

                                                                                              bull Yoursquore paying for the entire VM so why not use it

                                                                                              bull Common mistake ndash split up code into multiple roles each not using up CPU

                                                                                              bull Balance between using up CPU vs having free capacity in times of need

                                                                                              bull Multiple ways to use your CPU to the fullest

                                                                                              Exploiting Concurrency

                                                                                              bull Spin up additional processes each with a specific task or as a unit of concurrency

                                                                                              bull May not be ideal if number of active processes exceeds number of cores

                                                                                              bull Use multithreading aggressively

                                                                                              bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                                                                                              bull In NET 4 use the Task Parallel Library

                                                                                              bull Data parallelism

                                                                                              bull Task parallelism

                                                                                              Finding Good Code Neighbors

                                                                                              bull Typically code falls into one or more of these categories

                                                                                              bull Find code that is intensive with different resources to live together

                                                                                              bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                                                                                              Memory Intensive

                                                                                              CPU Intensive

                                                                                              Network IO Intensive

                                                                                              Storage IO Intensive

                                                                                              Scaling Appropriately

                                                                                              bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                                                                                              bull Spinning VMs up and down automatically is good at large scale

                                                                                              bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                                                                                              bull Being too aggressive in spinning down VMs can result in poor user experience

                                                                                              bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                                                                                              Performance Cost

                                                                                              Storage Costs

                                                                                              bull Understand an applicationrsquos storage profile and how storage billing works

                                                                                              bull Make service choices based on your app profile

                                                                                              bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                                                                                              bull Service choice can make a big cost difference based on your app profile

                                                                                              bull Caching and compressing They help a lot with storage costs

                                                                                              Saving Bandwidth Costs

                                                                                              Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                                                                              Sending fewer things over the wire often means getting fewer things from storage

                                                                                              Saving bandwidth costs often lead to savings in other places

                                                                                              Sending fewer things means your VM has time to do other tasks

                                                                                              All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                                                                              Compressing Content

                                                                                              1 Gzip all output content

                                                                                              bull All modern browsers can decompress on the fly

                                                                                              bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                                                                              2Tradeoff compute costs for storage size

                                                                                              3Minimize image sizes

                                                                                              bull Use Portable Network Graphics (PNGs)

                                                                                              bull Crush your PNGs

                                                                                              bull Strip needless metadata

                                                                                              bull Make all PNGs palette PNGs

                                                                                              Uncompressed

                                                                                              Content

                                                                                              Compressed

                                                                                              Content

                                                                                              Gzip

                                                                                              Minify JavaScript

                                                                                              Minify CCS

                                                                                              Minify Images

                                                                                              Best Practices Summary

                                                                                              Doing lsquolessrsquo is the key to saving costs

                                                                                              Measure everything

                                                                                              Know your application profile in and out

                                                                                              Research Examples in the Cloud

                                                                                              hellipon another set of slides

                                                                                              Map Reduce on Azure

                                                                                              bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                                                                              bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                                                                              76

                                                                                              Project Daytona - Map Reduce on Azure

                                                                                              httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                                                                              77

                                                                                              Questions and Discussionhellip

                                                                                              Thank you for hosting me at the Summer School

                                                                                              BLAST (Basic Local Alignment Search Tool)

                                                                                              bull The most important software in bioinformatics

                                                                                              bull Identify similarity between bio-sequences

                                                                                              Computationally intensive

                                                                                              bull Large number of pairwise alignment operations

                                                                                              bull A BLAST running can take 700 ~ 1000 CPU hours

                                                                                              bull Sequence databases growing exponentially

                                                                                              bull GenBank doubled in size in about 15 months

                                                                                              It is easy to parallelize BLAST

                                                                                              bull Segment the input

                                                                                              bull Segment processing (querying) is pleasingly parallel

                                                                                              bull Segment the database (eg mpiBLAST)

                                                                                              bull Needs special result reduction processing

                                                                                              Large volume data

                                                                                              bull A normal Blast database can be as large as 10GB

                                                                                              bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                                                              bull The output of BLAST is usually 10-100x larger than the input

                                                                                              bull Parallel BLAST engine on Azure

                                                                                              bull Query-segmentation data-parallel pattern bull split the input sequences

                                                                                              bull query partitions in parallel

                                                                                              bull merge results together when done

                                                                                              bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                                                              bull With three special considerations bull Batch job management

                                                                                              bull Task parallelism on an elastic Cloud

                                                                                              Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                                                              on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                                                              A simple SplitJoin pattern

                                                                                              Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                                                              bull 1248 for small middle large and extra large instance size

                                                                                              Task granularity bull Large partition load imbalance

                                                                                              bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                                                              bull Data transferring overhead

                                                                                              Best Practice test runs to profiling and set size to mitigate the overhead

                                                                                              Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                                                              bull too small repeated computation

                                                                                              bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                                                              bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                                                              bull Watch out for the 2-hour maximum limitation

                                                                                              BLAST task

                                                                                              Splitting task

                                                                                              BLAST task

                                                                                              BLAST task

                                                                                              BLAST task

                                                                                              hellip

                                                                                              Merging Task

                                                                                              Task size vs Performance

                                                                                              bull Benefit of the warm cache effect

                                                                                              bull 100 sequences per partition is the best choice

                                                                                              Instance size vs Performance

                                                                                              bull Super-linear speedup with larger size worker instances

                                                                                              bull Primarily due to the memory capability

                                                                                              Task SizeInstance Size vs Cost

                                                                                              bull Extra-large instance generated the best and the most economical throughput

                                                                                              bull Fully utilize the resource

                                                                                              Web

                                                                                              Portal

                                                                                              Web

                                                                                              Service

                                                                                              Job registration

                                                                                              Job Scheduler

                                                                                              Worker

                                                                                              Worker

                                                                                              Worker

                                                                                              Global

                                                                                              dispatch

                                                                                              queue

                                                                                              Web Role

                                                                                              Azure Table

                                                                                              Job Management Role

                                                                                              Azure Blob

                                                                                              Database

                                                                                              updating Role

                                                                                              hellip

                                                                                              Scaling Engine

                                                                                              Blast databases

                                                                                              temporary data etc)

                                                                                              Job Registry NCBI databases

                                                                                              BLAST task

                                                                                              Splitting task

                                                                                              BLAST task

                                                                                              BLAST task

                                                                                              BLAST task

                                                                                              hellip

                                                                                              Merging Task

                                                                                              ASPNET program hosted by a web role instance bull Submit jobs

                                                                                              bull Track jobrsquos status and logs

                                                                                              AuthenticationAuthorization based on Live ID

                                                                                              The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                                                              Web Portal

                                                                                              Web Service

                                                                                              Job registration

                                                                                              Job Scheduler

                                                                                              Job Portal

                                                                                              Scaling Engine

                                                                                              Job Registry

                                                                                              R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                                                              Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                                                              bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                                                              AzureBLAST significantly saved computing timehellip

                                                                                              Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                                                              ldquoAll against Allrdquo query bull The database is also the input query

                                                                                              bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                                                              bull Theoretically 100 billion sequence comparisons

                                                                                              Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                                                              bull Would require 3216731 minutes (61 years) on one desktop

                                                                                              One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                                                              bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                                              bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                                              bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                                              bull Each segment consists of smaller partitions

                                                                                              bull When load imbalances redistribute the load manually

                                                                                              5

                                                                                              0

                                                                                              62 6

                                                                                              2 6

                                                                                              2 6

                                                                                              2 6

                                                                                              2 5

                                                                                              0 62

                                                                                              bull Total size of the output result is ~230GB

                                                                                              bull The number of total hits is 1764579487

                                                                                              bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                                              bull But based our estimates real working instance time should be 6~8 day

                                                                                              bull Look into log data to analyze what took placehellip

                                                                                              5

                                                                                              0

                                                                                              62 6

                                                                                              2 6

                                                                                              2 6

                                                                                              2 6

                                                                                              2 5

                                                                                              0 62

                                                                                              A normal log record should be

                                                                                              Otherwise something is wrong (eg task failed to complete)

                                                                                              3312010 614 RD00155D3611B0 Executing the task 251523

                                                                                              3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                                              3312010 625 RD00155D3611B0 Executing the task 251553

                                                                                              3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                                              3312010 644 RD00155D3611B0 Executing the task 251600

                                                                                              3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                                              3312010 822 RD00155D3611B0 Executing the task 251774

                                                                                              3312010 950 RD00155D3611B0 Executing the task 251895

                                                                                              3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                                              North Europe Data Center totally 34256 tasks processed

                                                                                              All 62 compute nodes lost tasks

                                                                                              and then came back in a group

                                                                                              This is an Update domain

                                                                                              ~30 mins

                                                                                              ~ 6 nodes in one group

                                                                                              35 Nodes experience blob

                                                                                              writing failure at same

                                                                                              time

                                                                                              West Europe Datacenter 30976 tasks are completed and job was killed

                                                                                              A reasonable guess the

                                                                                              Fault Domain is working

                                                                                              MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                                              You never miss the water till the well has run dry Irish Proverb

                                                                                              ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                                              Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                                              λv = Latent heat of vaporization (Jg)

                                                                                              Rn = Net radiation (W m-2)

                                                                                              cp = Specific heat capacity of air (J kg-1 K-1)

                                                                                              ρa = dry air density (kg m-3)

                                                                                              δq = vapor pressure deficit (Pa)

                                                                                              ga = Conductivity of air (inverse of ra) (m s-1)

                                                                                              gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                                              γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                                              Estimating resistanceconductivity across a

                                                                                              catchment can be tricky

                                                                                              bull Lots of inputs big data reduction

                                                                                              bull Some of the inputs are not so simple

                                                                                              119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                                              (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                                              Penman-Monteith (1964)

                                                                                              Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                                              NASA MODIS

                                                                                              imagery source

                                                                                              archives

                                                                                              5 TB (600K files)

                                                                                              FLUXNET curated

                                                                                              sensor dataset

                                                                                              (30GB 960 files)

                                                                                              FLUXNET curated

                                                                                              field dataset

                                                                                              2 KB (1 file)

                                                                                              NCEPNCAR

                                                                                              ~100MB

                                                                                              (4K files)

                                                                                              Vegetative

                                                                                              clumping

                                                                                              ~5MB (1file)

                                                                                              Climate

                                                                                              classification

                                                                                              ~1MB (1file)

                                                                                              20 US year = 1 global year

                                                                                              Data collection (map) stage

                                                                                              bull Downloads requested input tiles

                                                                                              from NASA ftp sites

                                                                                              bull Includes geospatial lookup for

                                                                                              non-sinusoidal tiles that will

                                                                                              contribute to a reprojected

                                                                                              sinusoidal tile

                                                                                              Reprojection (map) stage

                                                                                              bull Converts source tile(s) to

                                                                                              intermediate result sinusoidal tiles

                                                                                              bull Simple nearest neighbor or spline

                                                                                              algorithms

                                                                                              Derivation reduction stage

                                                                                              bull First stage visible to scientist

                                                                                              bull Computes ET in our initial use

                                                                                              Analysis reduction stage

                                                                                              bull Optional second stage visible to

                                                                                              scientist

                                                                                              bull Enables production of science

                                                                                              analysis artifacts such as maps

                                                                                              tables virtual sensors

                                                                                              Reduction 1

                                                                                              Queue

                                                                                              Source

                                                                                              Metadata

                                                                                              AzureMODIS

                                                                                              Service Web Role Portal

                                                                                              Request

                                                                                              Queue

                                                                                              Scientific

                                                                                              Results

                                                                                              Download

                                                                                              Data Collection Stage

                                                                                              Source Imagery Download Sites

                                                                                              Reprojection

                                                                                              Queue

                                                                                              Reduction 2

                                                                                              Queue

                                                                                              Download

                                                                                              Queue

                                                                                              Scientists

                                                                                              Science results

                                                                                              Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                              httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                              bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                              bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                              bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                              recoverable units of work

                                                                                              bull Execution status of all jobs and tasks persisted in Tables

                                                                                              ltPipelineStagegt

                                                                                              Request

                                                                                              hellip ltPipelineStagegtJobStatus

                                                                                              Persist ltPipelineStagegtJob Queue

                                                                                              MODISAzure Service

                                                                                              (Web Role)

                                                                                              Service Monitor

                                                                                              (Worker Role)

                                                                                              Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                              hellip

                                                                                              Dispatch

                                                                                              ltPipelineStagegtTask Queue

                                                                                              All work actually done by a Worker Role

                                                                                              bull Sandboxes science or other executable

                                                                                              bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                              Service Monitor

                                                                                              (Worker Role)

                                                                                              Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                              GenericWorker

                                                                                              (Worker Role)

                                                                                              hellip

                                                                                              hellip

                                                                                              Dispatch

                                                                                              ltPipelineStagegtTask Queue

                                                                                              hellip

                                                                                              ltInputgtData Storage

                                                                                              bull Dequeues tasks created by the Service Monitor

                                                                                              bull Retries failed tasks 3 times

                                                                                              bull Maintains all task status

                                                                                              Reprojection Request

                                                                                              hellip

                                                                                              Service Monitor

                                                                                              (Worker Role)

                                                                                              ReprojectionJobStatus Persist

                                                                                              Parse amp Persist ReprojectionTaskStatus

                                                                                              GenericWorker

                                                                                              (Worker Role)

                                                                                              hellip

                                                                                              Job Queue

                                                                                              hellip

                                                                                              Dispatch

                                                                                              Task Queue

                                                                                              Points to

                                                                                              hellip

                                                                                              ScanTimeList

                                                                                              SwathGranuleMeta

                                                                                              Reprojection Data

                                                                                              Storage

                                                                                              Each entity specifies a

                                                                                              single reprojection job

                                                                                              request

                                                                                              Each entity specifies a

                                                                                              single reprojection task (ie

                                                                                              a single tile)

                                                                                              Query this table to get

                                                                                              geo-metadata (eg

                                                                                              boundaries) for each swath

                                                                                              tile

                                                                                              Query this table to get the

                                                                                              list of satellite scan times

                                                                                              that cover a target tile

                                                                                              Swath Source

                                                                                              Data Storage

                                                                                              bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                              bull Storage costs driven by data scale and 6 month project duration

                                                                                              bull Small with respect to the people costs even at graduate student rates

                                                                                              Reduction 1 Queue

                                                                                              Source Metadata

                                                                                              Request Queue

                                                                                              Scientific Results Download

                                                                                              Data Collection Stage

                                                                                              Source Imagery Download Sites

                                                                                              Reprojection Queue

                                                                                              Reduction 2 Queue

                                                                                              Download

                                                                                              Queue

                                                                                              Scientists

                                                                                              Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                              400-500 GB

                                                                                              60K files

                                                                                              10 MBsec

                                                                                              11 hours

                                                                                              lt10 workers

                                                                                              $50 upload

                                                                                              $450 storage

                                                                                              400 GB

                                                                                              45K files

                                                                                              3500 hours

                                                                                              20-100

                                                                                              workers

                                                                                              5-7 GB

                                                                                              55K files

                                                                                              1800 hours

                                                                                              20-100

                                                                                              workers

                                                                                              lt10 GB

                                                                                              ~1K files

                                                                                              1800 hours

                                                                                              20-100

                                                                                              workers

                                                                                              $420 cpu

                                                                                              $60 download

                                                                                              $216 cpu

                                                                                              $1 download

                                                                                              $6 storage

                                                                                              $216 cpu

                                                                                              $2 download

                                                                                              $9 storage

                                                                                              AzureMODIS

                                                                                              Service Web Role Portal

                                                                                              Total $1420

                                                                                              bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                              potential to be important to both large and small scale science problems

                                                                                              bull Equally import they can increase participation in research providing needed

                                                                                              resources to userscommunities without ready access

                                                                                              bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                              support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                              latency applications do not perform optimally on clouds today

                                                                                              bull Provide valuable fault tolerance and scalability abstractions

                                                                                              bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                              bull Clouds services to support research provide considerable leverage for both

                                                                                              individual researchers and entire communities of researchers

                                                                                              • Day 2 - Azure as a PaaS
                                                                                              • Day 2 - Applications

                                                                                                Windows Azure Storage Takeaways

                                                                                                Blobs

                                                                                                Drives

                                                                                                Tables

                                                                                                Queues

                                                                                                httpblogsmsdncomwindowsazurestorage

                                                                                                httpazurescopecloudappnet

                                                                                                49

                                                                                                A Quick Exercise

                                                                                                hellipThen letrsquos look at some code and some tools

                                                                                                50

                                                                                                Code ndash AccountInformationcs public class AccountInformation private static string storageKey = ldquotHiSiSnOtMyKeY private static string accountName = jjstore private static StorageCredentialsAccountAndKey credentials internal static StorageCredentialsAccountAndKey Credentials get if (credentials == null) credentials = new StorageCredentialsAccountAndKey(accountName storageKey) return credentials

                                                                                                51

                                                                                                Code ndash BlobHelpercs public class BlobHelper private static string defaultContainerName = school private CloudBlobClient client = null private CloudBlobContainer container = null private void InitContainer() if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudBlobClient() container = clientGetContainerReference(defaultContainerName) containerCreateIfNotExist() BlobContainerPermissions permissions = containerGetPermissions() permissionsPublicAccess = BlobContainerPublicAccessTypeContainer containerSetPermissions(permissions)

                                                                                                52

                                                                                                Code ndash BlobHelpercs

                                                                                                public void WriteFileToBlob(string filePath) if (client == null || container == null) InitContainer() FileInfo file = new FileInfo(filePath) CloudBlob blob = containerGetBlobReference(fileName) blobPropertiesContentType = GetContentType(fileExtension) blobUploadFile(fileFullName) Or if you want to write a string replace the last line with blobUploadText(someString) And make sure you set the content type to the appropriate MIME type (eg ldquotextplainrdquo)

                                                                                                53

                                                                                                Code ndash BlobHelpercs

                                                                                                public string GetBlobText(string blobName) if (client == null || container == null) InitContainer() CloudBlob blob = containerGetBlobReference(blobName) try return blobDownloadText() catch (Exception) The blob probably does not exist or there is no connection available return null

                                                                                                54

                                                                                                Application Code - Blobs private void SaveToCloudButton_Click(object sender RoutedEventArgs e) StringBuilder buff = new StringBuilder() buffAppendLine(LastNameFirstNameEmailBirthdayNativeLanguageFavoriteIceCreamYearsInPhDGraduated) foreach (AttendeeEntity attendee in attendees) buffAppendLine(attendeeToCsvString()) blobHelperWriteStringToBlob(SummerSchoolAttendeestxt buffToString())

                                                                                                The blob is now available at httpltAccountNamegtblobcorewindowsnetltContainerNamegtltBlobNamegt Or in this case httpjjstoreblobcorewindowsnetschoolSummerSchoolAttendeestxt

                                                                                                55

                                                                                                Code - TableEntities using MicrosoftWindowsAzureStorageClient public class AttendeeEntity TableServiceEntity public string FirstName get set public string LastName get set public string Email get set public DateTime Birthday get set public string FavoriteIceCream get set public int YearsInPhD get set public bool Graduated get set hellip

                                                                                                56

                                                                                                Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

                                                                                                57

                                                                                                Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

                                                                                                58

                                                                                                Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

                                                                                                59

                                                                                                Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

                                                                                                60

                                                                                                Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

                                                                                                Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

                                                                                                61

                                                                                                Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

                                                                                                62

                                                                                                Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

                                                                                                Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

                                                                                                63

                                                                                                Tools ndash Fiddler2

                                                                                                Best Practices

                                                                                                Picking the Right VM Size

                                                                                                bull Having the correct VM size can make a big difference in costs

                                                                                                bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                                                                                                bull If you scale better than linear across cores larger VMs could save you money

                                                                                                bull Pretty rare to see linear scaling across 8 cores

                                                                                                bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                                                                                                bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                                                                                                Using Your VM to the Maximum

                                                                                                Remember bull 1 role instance == 1 VM running Windows

                                                                                                bull 1 role instance = one specific task for your code

                                                                                                bull Yoursquore paying for the entire VM so why not use it

                                                                                                bull Common mistake ndash split up code into multiple roles each not using up CPU

                                                                                                bull Balance between using up CPU vs having free capacity in times of need

                                                                                                bull Multiple ways to use your CPU to the fullest

                                                                                                Exploiting Concurrency

                                                                                                bull Spin up additional processes each with a specific task or as a unit of concurrency

                                                                                                bull May not be ideal if number of active processes exceeds number of cores

                                                                                                bull Use multithreading aggressively

                                                                                                bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                                                                                                bull In NET 4 use the Task Parallel Library

                                                                                                bull Data parallelism

                                                                                                bull Task parallelism

                                                                                                Finding Good Code Neighbors

                                                                                                bull Typically code falls into one or more of these categories

                                                                                                bull Find code that is intensive with different resources to live together

                                                                                                bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                                                                                                Memory Intensive

                                                                                                CPU Intensive

                                                                                                Network IO Intensive

                                                                                                Storage IO Intensive

                                                                                                Scaling Appropriately

                                                                                                bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                                                                                                bull Spinning VMs up and down automatically is good at large scale

                                                                                                bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                                                                                                bull Being too aggressive in spinning down VMs can result in poor user experience

                                                                                                bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                                                                                                Performance Cost

                                                                                                Storage Costs

                                                                                                bull Understand an applicationrsquos storage profile and how storage billing works

                                                                                                bull Make service choices based on your app profile

                                                                                                bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                                                                                                bull Service choice can make a big cost difference based on your app profile

                                                                                                bull Caching and compressing They help a lot with storage costs

                                                                                                Saving Bandwidth Costs

                                                                                                Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                                                                                Sending fewer things over the wire often means getting fewer things from storage

                                                                                                Saving bandwidth costs often lead to savings in other places

                                                                                                Sending fewer things means your VM has time to do other tasks

                                                                                                All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                                                                                Compressing Content

                                                                                                1 Gzip all output content

                                                                                                bull All modern browsers can decompress on the fly

                                                                                                bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                                                                                2Tradeoff compute costs for storage size

                                                                                                3Minimize image sizes

                                                                                                bull Use Portable Network Graphics (PNGs)

                                                                                                bull Crush your PNGs

                                                                                                bull Strip needless metadata

                                                                                                bull Make all PNGs palette PNGs

                                                                                                Uncompressed

                                                                                                Content

                                                                                                Compressed

                                                                                                Content

                                                                                                Gzip

                                                                                                Minify JavaScript

                                                                                                Minify CCS

                                                                                                Minify Images

                                                                                                Best Practices Summary

                                                                                                Doing lsquolessrsquo is the key to saving costs

                                                                                                Measure everything

                                                                                                Know your application profile in and out

                                                                                                Research Examples in the Cloud

                                                                                                hellipon another set of slides

                                                                                                Map Reduce on Azure

                                                                                                bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                                                                                bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                                                                                76

                                                                                                Project Daytona - Map Reduce on Azure

                                                                                                httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                                                                                77

                                                                                                Questions and Discussionhellip

                                                                                                Thank you for hosting me at the Summer School

                                                                                                BLAST (Basic Local Alignment Search Tool)

                                                                                                bull The most important software in bioinformatics

                                                                                                bull Identify similarity between bio-sequences

                                                                                                Computationally intensive

                                                                                                bull Large number of pairwise alignment operations

                                                                                                bull A BLAST running can take 700 ~ 1000 CPU hours

                                                                                                bull Sequence databases growing exponentially

                                                                                                bull GenBank doubled in size in about 15 months

                                                                                                It is easy to parallelize BLAST

                                                                                                bull Segment the input

                                                                                                bull Segment processing (querying) is pleasingly parallel

                                                                                                bull Segment the database (eg mpiBLAST)

                                                                                                bull Needs special result reduction processing

                                                                                                Large volume data

                                                                                                bull A normal Blast database can be as large as 10GB

                                                                                                bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                                                                bull The output of BLAST is usually 10-100x larger than the input

                                                                                                bull Parallel BLAST engine on Azure

                                                                                                bull Query-segmentation data-parallel pattern bull split the input sequences

                                                                                                bull query partitions in parallel

                                                                                                bull merge results together when done

                                                                                                bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                                                                bull With three special considerations bull Batch job management

                                                                                                bull Task parallelism on an elastic Cloud

                                                                                                Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                                                                on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                                                                A simple SplitJoin pattern

                                                                                                Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                                                                bull 1248 for small middle large and extra large instance size

                                                                                                Task granularity bull Large partition load imbalance

                                                                                                bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                                                                bull Data transferring overhead

                                                                                                Best Practice test runs to profiling and set size to mitigate the overhead

                                                                                                Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                                                                bull too small repeated computation

                                                                                                bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                                                                bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                                                                bull Watch out for the 2-hour maximum limitation

                                                                                                BLAST task

                                                                                                Splitting task

                                                                                                BLAST task

                                                                                                BLAST task

                                                                                                BLAST task

                                                                                                hellip

                                                                                                Merging Task

                                                                                                Task size vs Performance

                                                                                                bull Benefit of the warm cache effect

                                                                                                bull 100 sequences per partition is the best choice

                                                                                                Instance size vs Performance

                                                                                                bull Super-linear speedup with larger size worker instances

                                                                                                bull Primarily due to the memory capability

                                                                                                Task SizeInstance Size vs Cost

                                                                                                bull Extra-large instance generated the best and the most economical throughput

                                                                                                bull Fully utilize the resource

                                                                                                Web

                                                                                                Portal

                                                                                                Web

                                                                                                Service

                                                                                                Job registration

                                                                                                Job Scheduler

                                                                                                Worker

                                                                                                Worker

                                                                                                Worker

                                                                                                Global

                                                                                                dispatch

                                                                                                queue

                                                                                                Web Role

                                                                                                Azure Table

                                                                                                Job Management Role

                                                                                                Azure Blob

                                                                                                Database

                                                                                                updating Role

                                                                                                hellip

                                                                                                Scaling Engine

                                                                                                Blast databases

                                                                                                temporary data etc)

                                                                                                Job Registry NCBI databases

                                                                                                BLAST task

                                                                                                Splitting task

                                                                                                BLAST task

                                                                                                BLAST task

                                                                                                BLAST task

                                                                                                hellip

                                                                                                Merging Task

                                                                                                ASPNET program hosted by a web role instance bull Submit jobs

                                                                                                bull Track jobrsquos status and logs

                                                                                                AuthenticationAuthorization based on Live ID

                                                                                                The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                                                                Web Portal

                                                                                                Web Service

                                                                                                Job registration

                                                                                                Job Scheduler

                                                                                                Job Portal

                                                                                                Scaling Engine

                                                                                                Job Registry

                                                                                                R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                                                                Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                                                                bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                                                                AzureBLAST significantly saved computing timehellip

                                                                                                Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                                                                ldquoAll against Allrdquo query bull The database is also the input query

                                                                                                bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                                                                bull Theoretically 100 billion sequence comparisons

                                                                                                Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                                                                bull Would require 3216731 minutes (61 years) on one desktop

                                                                                                One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                                                                bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                                                bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                                                bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                                                bull Each segment consists of smaller partitions

                                                                                                bull When load imbalances redistribute the load manually

                                                                                                5

                                                                                                0

                                                                                                62 6

                                                                                                2 6

                                                                                                2 6

                                                                                                2 6

                                                                                                2 5

                                                                                                0 62

                                                                                                bull Total size of the output result is ~230GB

                                                                                                bull The number of total hits is 1764579487

                                                                                                bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                                                bull But based our estimates real working instance time should be 6~8 day

                                                                                                bull Look into log data to analyze what took placehellip

                                                                                                5

                                                                                                0

                                                                                                62 6

                                                                                                2 6

                                                                                                2 6

                                                                                                2 6

                                                                                                2 5

                                                                                                0 62

                                                                                                A normal log record should be

                                                                                                Otherwise something is wrong (eg task failed to complete)

                                                                                                3312010 614 RD00155D3611B0 Executing the task 251523

                                                                                                3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                                                3312010 625 RD00155D3611B0 Executing the task 251553

                                                                                                3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                                                3312010 644 RD00155D3611B0 Executing the task 251600

                                                                                                3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                                                3312010 822 RD00155D3611B0 Executing the task 251774

                                                                                                3312010 950 RD00155D3611B0 Executing the task 251895

                                                                                                3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                                                North Europe Data Center totally 34256 tasks processed

                                                                                                All 62 compute nodes lost tasks

                                                                                                and then came back in a group

                                                                                                This is an Update domain

                                                                                                ~30 mins

                                                                                                ~ 6 nodes in one group

                                                                                                35 Nodes experience blob

                                                                                                writing failure at same

                                                                                                time

                                                                                                West Europe Datacenter 30976 tasks are completed and job was killed

                                                                                                A reasonable guess the

                                                                                                Fault Domain is working

                                                                                                MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                                                You never miss the water till the well has run dry Irish Proverb

                                                                                                ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                                                Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                                                λv = Latent heat of vaporization (Jg)

                                                                                                Rn = Net radiation (W m-2)

                                                                                                cp = Specific heat capacity of air (J kg-1 K-1)

                                                                                                ρa = dry air density (kg m-3)

                                                                                                δq = vapor pressure deficit (Pa)

                                                                                                ga = Conductivity of air (inverse of ra) (m s-1)

                                                                                                gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                                                γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                                                Estimating resistanceconductivity across a

                                                                                                catchment can be tricky

                                                                                                bull Lots of inputs big data reduction

                                                                                                bull Some of the inputs are not so simple

                                                                                                119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                                                (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                                                Penman-Monteith (1964)

                                                                                                Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                                                NASA MODIS

                                                                                                imagery source

                                                                                                archives

                                                                                                5 TB (600K files)

                                                                                                FLUXNET curated

                                                                                                sensor dataset

                                                                                                (30GB 960 files)

                                                                                                FLUXNET curated

                                                                                                field dataset

                                                                                                2 KB (1 file)

                                                                                                NCEPNCAR

                                                                                                ~100MB

                                                                                                (4K files)

                                                                                                Vegetative

                                                                                                clumping

                                                                                                ~5MB (1file)

                                                                                                Climate

                                                                                                classification

                                                                                                ~1MB (1file)

                                                                                                20 US year = 1 global year

                                                                                                Data collection (map) stage

                                                                                                bull Downloads requested input tiles

                                                                                                from NASA ftp sites

                                                                                                bull Includes geospatial lookup for

                                                                                                non-sinusoidal tiles that will

                                                                                                contribute to a reprojected

                                                                                                sinusoidal tile

                                                                                                Reprojection (map) stage

                                                                                                bull Converts source tile(s) to

                                                                                                intermediate result sinusoidal tiles

                                                                                                bull Simple nearest neighbor or spline

                                                                                                algorithms

                                                                                                Derivation reduction stage

                                                                                                bull First stage visible to scientist

                                                                                                bull Computes ET in our initial use

                                                                                                Analysis reduction stage

                                                                                                bull Optional second stage visible to

                                                                                                scientist

                                                                                                bull Enables production of science

                                                                                                analysis artifacts such as maps

                                                                                                tables virtual sensors

                                                                                                Reduction 1

                                                                                                Queue

                                                                                                Source

                                                                                                Metadata

                                                                                                AzureMODIS

                                                                                                Service Web Role Portal

                                                                                                Request

                                                                                                Queue

                                                                                                Scientific

                                                                                                Results

                                                                                                Download

                                                                                                Data Collection Stage

                                                                                                Source Imagery Download Sites

                                                                                                Reprojection

                                                                                                Queue

                                                                                                Reduction 2

                                                                                                Queue

                                                                                                Download

                                                                                                Queue

                                                                                                Scientists

                                                                                                Science results

                                                                                                Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                                bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                                bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                                bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                                recoverable units of work

                                                                                                bull Execution status of all jobs and tasks persisted in Tables

                                                                                                ltPipelineStagegt

                                                                                                Request

                                                                                                hellip ltPipelineStagegtJobStatus

                                                                                                Persist ltPipelineStagegtJob Queue

                                                                                                MODISAzure Service

                                                                                                (Web Role)

                                                                                                Service Monitor

                                                                                                (Worker Role)

                                                                                                Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                hellip

                                                                                                Dispatch

                                                                                                ltPipelineStagegtTask Queue

                                                                                                All work actually done by a Worker Role

                                                                                                bull Sandboxes science or other executable

                                                                                                bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                                Service Monitor

                                                                                                (Worker Role)

                                                                                                Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                GenericWorker

                                                                                                (Worker Role)

                                                                                                hellip

                                                                                                hellip

                                                                                                Dispatch

                                                                                                ltPipelineStagegtTask Queue

                                                                                                hellip

                                                                                                ltInputgtData Storage

                                                                                                bull Dequeues tasks created by the Service Monitor

                                                                                                bull Retries failed tasks 3 times

                                                                                                bull Maintains all task status

                                                                                                Reprojection Request

                                                                                                hellip

                                                                                                Service Monitor

                                                                                                (Worker Role)

                                                                                                ReprojectionJobStatus Persist

                                                                                                Parse amp Persist ReprojectionTaskStatus

                                                                                                GenericWorker

                                                                                                (Worker Role)

                                                                                                hellip

                                                                                                Job Queue

                                                                                                hellip

                                                                                                Dispatch

                                                                                                Task Queue

                                                                                                Points to

                                                                                                hellip

                                                                                                ScanTimeList

                                                                                                SwathGranuleMeta

                                                                                                Reprojection Data

                                                                                                Storage

                                                                                                Each entity specifies a

                                                                                                single reprojection job

                                                                                                request

                                                                                                Each entity specifies a

                                                                                                single reprojection task (ie

                                                                                                a single tile)

                                                                                                Query this table to get

                                                                                                geo-metadata (eg

                                                                                                boundaries) for each swath

                                                                                                tile

                                                                                                Query this table to get the

                                                                                                list of satellite scan times

                                                                                                that cover a target tile

                                                                                                Swath Source

                                                                                                Data Storage

                                                                                                bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                                bull Storage costs driven by data scale and 6 month project duration

                                                                                                bull Small with respect to the people costs even at graduate student rates

                                                                                                Reduction 1 Queue

                                                                                                Source Metadata

                                                                                                Request Queue

                                                                                                Scientific Results Download

                                                                                                Data Collection Stage

                                                                                                Source Imagery Download Sites

                                                                                                Reprojection Queue

                                                                                                Reduction 2 Queue

                                                                                                Download

                                                                                                Queue

                                                                                                Scientists

                                                                                                Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                400-500 GB

                                                                                                60K files

                                                                                                10 MBsec

                                                                                                11 hours

                                                                                                lt10 workers

                                                                                                $50 upload

                                                                                                $450 storage

                                                                                                400 GB

                                                                                                45K files

                                                                                                3500 hours

                                                                                                20-100

                                                                                                workers

                                                                                                5-7 GB

                                                                                                55K files

                                                                                                1800 hours

                                                                                                20-100

                                                                                                workers

                                                                                                lt10 GB

                                                                                                ~1K files

                                                                                                1800 hours

                                                                                                20-100

                                                                                                workers

                                                                                                $420 cpu

                                                                                                $60 download

                                                                                                $216 cpu

                                                                                                $1 download

                                                                                                $6 storage

                                                                                                $216 cpu

                                                                                                $2 download

                                                                                                $9 storage

                                                                                                AzureMODIS

                                                                                                Service Web Role Portal

                                                                                                Total $1420

                                                                                                bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                                potential to be important to both large and small scale science problems

                                                                                                bull Equally import they can increase participation in research providing needed

                                                                                                resources to userscommunities without ready access

                                                                                                bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                                support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                                latency applications do not perform optimally on clouds today

                                                                                                bull Provide valuable fault tolerance and scalability abstractions

                                                                                                bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                                bull Clouds services to support research provide considerable leverage for both

                                                                                                individual researchers and entire communities of researchers

                                                                                                • Day 2 - Azure as a PaaS
                                                                                                • Day 2 - Applications

                                                                                                  49

                                                                                                  A Quick Exercise

                                                                                                  hellipThen letrsquos look at some code and some tools

                                                                                                  50

                                                                                                  Code ndash AccountInformationcs public class AccountInformation private static string storageKey = ldquotHiSiSnOtMyKeY private static string accountName = jjstore private static StorageCredentialsAccountAndKey credentials internal static StorageCredentialsAccountAndKey Credentials get if (credentials == null) credentials = new StorageCredentialsAccountAndKey(accountName storageKey) return credentials

                                                                                                  51

                                                                                                  Code ndash BlobHelpercs public class BlobHelper private static string defaultContainerName = school private CloudBlobClient client = null private CloudBlobContainer container = null private void InitContainer() if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudBlobClient() container = clientGetContainerReference(defaultContainerName) containerCreateIfNotExist() BlobContainerPermissions permissions = containerGetPermissions() permissionsPublicAccess = BlobContainerPublicAccessTypeContainer containerSetPermissions(permissions)

                                                                                                  52

                                                                                                  Code ndash BlobHelpercs

                                                                                                  public void WriteFileToBlob(string filePath) if (client == null || container == null) InitContainer() FileInfo file = new FileInfo(filePath) CloudBlob blob = containerGetBlobReference(fileName) blobPropertiesContentType = GetContentType(fileExtension) blobUploadFile(fileFullName) Or if you want to write a string replace the last line with blobUploadText(someString) And make sure you set the content type to the appropriate MIME type (eg ldquotextplainrdquo)

                                                                                                  53

                                                                                                  Code ndash BlobHelpercs

                                                                                                  public string GetBlobText(string blobName) if (client == null || container == null) InitContainer() CloudBlob blob = containerGetBlobReference(blobName) try return blobDownloadText() catch (Exception) The blob probably does not exist or there is no connection available return null

                                                                                                  54

                                                                                                  Application Code - Blobs private void SaveToCloudButton_Click(object sender RoutedEventArgs e) StringBuilder buff = new StringBuilder() buffAppendLine(LastNameFirstNameEmailBirthdayNativeLanguageFavoriteIceCreamYearsInPhDGraduated) foreach (AttendeeEntity attendee in attendees) buffAppendLine(attendeeToCsvString()) blobHelperWriteStringToBlob(SummerSchoolAttendeestxt buffToString())

                                                                                                  The blob is now available at httpltAccountNamegtblobcorewindowsnetltContainerNamegtltBlobNamegt Or in this case httpjjstoreblobcorewindowsnetschoolSummerSchoolAttendeestxt

                                                                                                  55

                                                                                                  Code - TableEntities using MicrosoftWindowsAzureStorageClient public class AttendeeEntity TableServiceEntity public string FirstName get set public string LastName get set public string Email get set public DateTime Birthday get set public string FavoriteIceCream get set public int YearsInPhD get set public bool Graduated get set hellip

                                                                                                  56

                                                                                                  Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

                                                                                                  57

                                                                                                  Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

                                                                                                  58

                                                                                                  Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

                                                                                                  59

                                                                                                  Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

                                                                                                  60

                                                                                                  Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

                                                                                                  Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

                                                                                                  61

                                                                                                  Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

                                                                                                  62

                                                                                                  Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

                                                                                                  Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

                                                                                                  63

                                                                                                  Tools ndash Fiddler2

                                                                                                  Best Practices

                                                                                                  Picking the Right VM Size

                                                                                                  bull Having the correct VM size can make a big difference in costs

                                                                                                  bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                                                                                                  bull If you scale better than linear across cores larger VMs could save you money

                                                                                                  bull Pretty rare to see linear scaling across 8 cores

                                                                                                  bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                                                                                                  bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                                                                                                  Using Your VM to the Maximum

                                                                                                  Remember bull 1 role instance == 1 VM running Windows

                                                                                                  bull 1 role instance = one specific task for your code

                                                                                                  bull Yoursquore paying for the entire VM so why not use it

                                                                                                  bull Common mistake ndash split up code into multiple roles each not using up CPU

                                                                                                  bull Balance between using up CPU vs having free capacity in times of need

                                                                                                  bull Multiple ways to use your CPU to the fullest

                                                                                                  Exploiting Concurrency

                                                                                                  bull Spin up additional processes each with a specific task or as a unit of concurrency

                                                                                                  bull May not be ideal if number of active processes exceeds number of cores

                                                                                                  bull Use multithreading aggressively

                                                                                                  bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                                                                                                  bull In NET 4 use the Task Parallel Library

                                                                                                  bull Data parallelism

                                                                                                  bull Task parallelism

                                                                                                  Finding Good Code Neighbors

                                                                                                  bull Typically code falls into one or more of these categories

                                                                                                  bull Find code that is intensive with different resources to live together

                                                                                                  bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                                                                                                  Memory Intensive

                                                                                                  CPU Intensive

                                                                                                  Network IO Intensive

                                                                                                  Storage IO Intensive

                                                                                                  Scaling Appropriately

                                                                                                  bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                                                                                                  bull Spinning VMs up and down automatically is good at large scale

                                                                                                  bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                                                                                                  bull Being too aggressive in spinning down VMs can result in poor user experience

                                                                                                  bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                                                                                                  Performance Cost

                                                                                                  Storage Costs

                                                                                                  bull Understand an applicationrsquos storage profile and how storage billing works

                                                                                                  bull Make service choices based on your app profile

                                                                                                  bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                                                                                                  bull Service choice can make a big cost difference based on your app profile

                                                                                                  bull Caching and compressing They help a lot with storage costs

                                                                                                  Saving Bandwidth Costs

                                                                                                  Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                                                                                  Sending fewer things over the wire often means getting fewer things from storage

                                                                                                  Saving bandwidth costs often lead to savings in other places

                                                                                                  Sending fewer things means your VM has time to do other tasks

                                                                                                  All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                                                                                  Compressing Content

                                                                                                  1 Gzip all output content

                                                                                                  bull All modern browsers can decompress on the fly

                                                                                                  bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                                                                                  2Tradeoff compute costs for storage size

                                                                                                  3Minimize image sizes

                                                                                                  bull Use Portable Network Graphics (PNGs)

                                                                                                  bull Crush your PNGs

                                                                                                  bull Strip needless metadata

                                                                                                  bull Make all PNGs palette PNGs

                                                                                                  Uncompressed

                                                                                                  Content

                                                                                                  Compressed

                                                                                                  Content

                                                                                                  Gzip

                                                                                                  Minify JavaScript

                                                                                                  Minify CCS

                                                                                                  Minify Images

                                                                                                  Best Practices Summary

                                                                                                  Doing lsquolessrsquo is the key to saving costs

                                                                                                  Measure everything

                                                                                                  Know your application profile in and out

                                                                                                  Research Examples in the Cloud

                                                                                                  hellipon another set of slides

                                                                                                  Map Reduce on Azure

                                                                                                  bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                                                                                  bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                                                                                  76

                                                                                                  Project Daytona - Map Reduce on Azure

                                                                                                  httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                                                                                  77

                                                                                                  Questions and Discussionhellip

                                                                                                  Thank you for hosting me at the Summer School

                                                                                                  BLAST (Basic Local Alignment Search Tool)

                                                                                                  bull The most important software in bioinformatics

                                                                                                  bull Identify similarity between bio-sequences

                                                                                                  Computationally intensive

                                                                                                  bull Large number of pairwise alignment operations

                                                                                                  bull A BLAST running can take 700 ~ 1000 CPU hours

                                                                                                  bull Sequence databases growing exponentially

                                                                                                  bull GenBank doubled in size in about 15 months

                                                                                                  It is easy to parallelize BLAST

                                                                                                  bull Segment the input

                                                                                                  bull Segment processing (querying) is pleasingly parallel

                                                                                                  bull Segment the database (eg mpiBLAST)

                                                                                                  bull Needs special result reduction processing

                                                                                                  Large volume data

                                                                                                  bull A normal Blast database can be as large as 10GB

                                                                                                  bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                                                                  bull The output of BLAST is usually 10-100x larger than the input

                                                                                                  bull Parallel BLAST engine on Azure

                                                                                                  bull Query-segmentation data-parallel pattern bull split the input sequences

                                                                                                  bull query partitions in parallel

                                                                                                  bull merge results together when done

                                                                                                  bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                                                                  bull With three special considerations bull Batch job management

                                                                                                  bull Task parallelism on an elastic Cloud

                                                                                                  Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                                                                  on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                                                                  A simple SplitJoin pattern

                                                                                                  Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                                                                  bull 1248 for small middle large and extra large instance size

                                                                                                  Task granularity bull Large partition load imbalance

                                                                                                  bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                                                                  bull Data transferring overhead

                                                                                                  Best Practice test runs to profiling and set size to mitigate the overhead

                                                                                                  Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                                                                  bull too small repeated computation

                                                                                                  bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                                                                  bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                                                                  bull Watch out for the 2-hour maximum limitation

                                                                                                  BLAST task

                                                                                                  Splitting task

                                                                                                  BLAST task

                                                                                                  BLAST task

                                                                                                  BLAST task

                                                                                                  hellip

                                                                                                  Merging Task

                                                                                                  Task size vs Performance

                                                                                                  bull Benefit of the warm cache effect

                                                                                                  bull 100 sequences per partition is the best choice

                                                                                                  Instance size vs Performance

                                                                                                  bull Super-linear speedup with larger size worker instances

                                                                                                  bull Primarily due to the memory capability

                                                                                                  Task SizeInstance Size vs Cost

                                                                                                  bull Extra-large instance generated the best and the most economical throughput

                                                                                                  bull Fully utilize the resource

                                                                                                  Web

                                                                                                  Portal

                                                                                                  Web

                                                                                                  Service

                                                                                                  Job registration

                                                                                                  Job Scheduler

                                                                                                  Worker

                                                                                                  Worker

                                                                                                  Worker

                                                                                                  Global

                                                                                                  dispatch

                                                                                                  queue

                                                                                                  Web Role

                                                                                                  Azure Table

                                                                                                  Job Management Role

                                                                                                  Azure Blob

                                                                                                  Database

                                                                                                  updating Role

                                                                                                  hellip

                                                                                                  Scaling Engine

                                                                                                  Blast databases

                                                                                                  temporary data etc)

                                                                                                  Job Registry NCBI databases

                                                                                                  BLAST task

                                                                                                  Splitting task

                                                                                                  BLAST task

                                                                                                  BLAST task

                                                                                                  BLAST task

                                                                                                  hellip

                                                                                                  Merging Task

                                                                                                  ASPNET program hosted by a web role instance bull Submit jobs

                                                                                                  bull Track jobrsquos status and logs

                                                                                                  AuthenticationAuthorization based on Live ID

                                                                                                  The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                                                                  Web Portal

                                                                                                  Web Service

                                                                                                  Job registration

                                                                                                  Job Scheduler

                                                                                                  Job Portal

                                                                                                  Scaling Engine

                                                                                                  Job Registry

                                                                                                  R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                                                                  Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                                                                  bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                                                                  AzureBLAST significantly saved computing timehellip

                                                                                                  Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                                                                  ldquoAll against Allrdquo query bull The database is also the input query

                                                                                                  bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                                                                  bull Theoretically 100 billion sequence comparisons

                                                                                                  Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                                                                  bull Would require 3216731 minutes (61 years) on one desktop

                                                                                                  One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                                                                  bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                                                  bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                                                  bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                                                  bull Each segment consists of smaller partitions

                                                                                                  bull When load imbalances redistribute the load manually

                                                                                                  5

                                                                                                  0

                                                                                                  62 6

                                                                                                  2 6

                                                                                                  2 6

                                                                                                  2 6

                                                                                                  2 5

                                                                                                  0 62

                                                                                                  bull Total size of the output result is ~230GB

                                                                                                  bull The number of total hits is 1764579487

                                                                                                  bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                                                  bull But based our estimates real working instance time should be 6~8 day

                                                                                                  bull Look into log data to analyze what took placehellip

                                                                                                  5

                                                                                                  0

                                                                                                  62 6

                                                                                                  2 6

                                                                                                  2 6

                                                                                                  2 6

                                                                                                  2 5

                                                                                                  0 62

                                                                                                  A normal log record should be

                                                                                                  Otherwise something is wrong (eg task failed to complete)

                                                                                                  3312010 614 RD00155D3611B0 Executing the task 251523

                                                                                                  3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                                                  3312010 625 RD00155D3611B0 Executing the task 251553

                                                                                                  3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                                                  3312010 644 RD00155D3611B0 Executing the task 251600

                                                                                                  3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                                                  3312010 822 RD00155D3611B0 Executing the task 251774

                                                                                                  3312010 950 RD00155D3611B0 Executing the task 251895

                                                                                                  3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                                                  North Europe Data Center totally 34256 tasks processed

                                                                                                  All 62 compute nodes lost tasks

                                                                                                  and then came back in a group

                                                                                                  This is an Update domain

                                                                                                  ~30 mins

                                                                                                  ~ 6 nodes in one group

                                                                                                  35 Nodes experience blob

                                                                                                  writing failure at same

                                                                                                  time

                                                                                                  West Europe Datacenter 30976 tasks are completed and job was killed

                                                                                                  A reasonable guess the

                                                                                                  Fault Domain is working

                                                                                                  MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                                                  You never miss the water till the well has run dry Irish Proverb

                                                                                                  ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                                                  Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                                                  λv = Latent heat of vaporization (Jg)

                                                                                                  Rn = Net radiation (W m-2)

                                                                                                  cp = Specific heat capacity of air (J kg-1 K-1)

                                                                                                  ρa = dry air density (kg m-3)

                                                                                                  δq = vapor pressure deficit (Pa)

                                                                                                  ga = Conductivity of air (inverse of ra) (m s-1)

                                                                                                  gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                                                  γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                                                  Estimating resistanceconductivity across a

                                                                                                  catchment can be tricky

                                                                                                  bull Lots of inputs big data reduction

                                                                                                  bull Some of the inputs are not so simple

                                                                                                  119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                                                  (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                                                  Penman-Monteith (1964)

                                                                                                  Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                                                  NASA MODIS

                                                                                                  imagery source

                                                                                                  archives

                                                                                                  5 TB (600K files)

                                                                                                  FLUXNET curated

                                                                                                  sensor dataset

                                                                                                  (30GB 960 files)

                                                                                                  FLUXNET curated

                                                                                                  field dataset

                                                                                                  2 KB (1 file)

                                                                                                  NCEPNCAR

                                                                                                  ~100MB

                                                                                                  (4K files)

                                                                                                  Vegetative

                                                                                                  clumping

                                                                                                  ~5MB (1file)

                                                                                                  Climate

                                                                                                  classification

                                                                                                  ~1MB (1file)

                                                                                                  20 US year = 1 global year

                                                                                                  Data collection (map) stage

                                                                                                  bull Downloads requested input tiles

                                                                                                  from NASA ftp sites

                                                                                                  bull Includes geospatial lookup for

                                                                                                  non-sinusoidal tiles that will

                                                                                                  contribute to a reprojected

                                                                                                  sinusoidal tile

                                                                                                  Reprojection (map) stage

                                                                                                  bull Converts source tile(s) to

                                                                                                  intermediate result sinusoidal tiles

                                                                                                  bull Simple nearest neighbor or spline

                                                                                                  algorithms

                                                                                                  Derivation reduction stage

                                                                                                  bull First stage visible to scientist

                                                                                                  bull Computes ET in our initial use

                                                                                                  Analysis reduction stage

                                                                                                  bull Optional second stage visible to

                                                                                                  scientist

                                                                                                  bull Enables production of science

                                                                                                  analysis artifacts such as maps

                                                                                                  tables virtual sensors

                                                                                                  Reduction 1

                                                                                                  Queue

                                                                                                  Source

                                                                                                  Metadata

                                                                                                  AzureMODIS

                                                                                                  Service Web Role Portal

                                                                                                  Request

                                                                                                  Queue

                                                                                                  Scientific

                                                                                                  Results

                                                                                                  Download

                                                                                                  Data Collection Stage

                                                                                                  Source Imagery Download Sites

                                                                                                  Reprojection

                                                                                                  Queue

                                                                                                  Reduction 2

                                                                                                  Queue

                                                                                                  Download

                                                                                                  Queue

                                                                                                  Scientists

                                                                                                  Science results

                                                                                                  Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                  httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                                  bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                                  bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                                  bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                                  recoverable units of work

                                                                                                  bull Execution status of all jobs and tasks persisted in Tables

                                                                                                  ltPipelineStagegt

                                                                                                  Request

                                                                                                  hellip ltPipelineStagegtJobStatus

                                                                                                  Persist ltPipelineStagegtJob Queue

                                                                                                  MODISAzure Service

                                                                                                  (Web Role)

                                                                                                  Service Monitor

                                                                                                  (Worker Role)

                                                                                                  Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                  hellip

                                                                                                  Dispatch

                                                                                                  ltPipelineStagegtTask Queue

                                                                                                  All work actually done by a Worker Role

                                                                                                  bull Sandboxes science or other executable

                                                                                                  bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                                  Service Monitor

                                                                                                  (Worker Role)

                                                                                                  Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                  GenericWorker

                                                                                                  (Worker Role)

                                                                                                  hellip

                                                                                                  hellip

                                                                                                  Dispatch

                                                                                                  ltPipelineStagegtTask Queue

                                                                                                  hellip

                                                                                                  ltInputgtData Storage

                                                                                                  bull Dequeues tasks created by the Service Monitor

                                                                                                  bull Retries failed tasks 3 times

                                                                                                  bull Maintains all task status

                                                                                                  Reprojection Request

                                                                                                  hellip

                                                                                                  Service Monitor

                                                                                                  (Worker Role)

                                                                                                  ReprojectionJobStatus Persist

                                                                                                  Parse amp Persist ReprojectionTaskStatus

                                                                                                  GenericWorker

                                                                                                  (Worker Role)

                                                                                                  hellip

                                                                                                  Job Queue

                                                                                                  hellip

                                                                                                  Dispatch

                                                                                                  Task Queue

                                                                                                  Points to

                                                                                                  hellip

                                                                                                  ScanTimeList

                                                                                                  SwathGranuleMeta

                                                                                                  Reprojection Data

                                                                                                  Storage

                                                                                                  Each entity specifies a

                                                                                                  single reprojection job

                                                                                                  request

                                                                                                  Each entity specifies a

                                                                                                  single reprojection task (ie

                                                                                                  a single tile)

                                                                                                  Query this table to get

                                                                                                  geo-metadata (eg

                                                                                                  boundaries) for each swath

                                                                                                  tile

                                                                                                  Query this table to get the

                                                                                                  list of satellite scan times

                                                                                                  that cover a target tile

                                                                                                  Swath Source

                                                                                                  Data Storage

                                                                                                  bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                                  bull Storage costs driven by data scale and 6 month project duration

                                                                                                  bull Small with respect to the people costs even at graduate student rates

                                                                                                  Reduction 1 Queue

                                                                                                  Source Metadata

                                                                                                  Request Queue

                                                                                                  Scientific Results Download

                                                                                                  Data Collection Stage

                                                                                                  Source Imagery Download Sites

                                                                                                  Reprojection Queue

                                                                                                  Reduction 2 Queue

                                                                                                  Download

                                                                                                  Queue

                                                                                                  Scientists

                                                                                                  Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                  400-500 GB

                                                                                                  60K files

                                                                                                  10 MBsec

                                                                                                  11 hours

                                                                                                  lt10 workers

                                                                                                  $50 upload

                                                                                                  $450 storage

                                                                                                  400 GB

                                                                                                  45K files

                                                                                                  3500 hours

                                                                                                  20-100

                                                                                                  workers

                                                                                                  5-7 GB

                                                                                                  55K files

                                                                                                  1800 hours

                                                                                                  20-100

                                                                                                  workers

                                                                                                  lt10 GB

                                                                                                  ~1K files

                                                                                                  1800 hours

                                                                                                  20-100

                                                                                                  workers

                                                                                                  $420 cpu

                                                                                                  $60 download

                                                                                                  $216 cpu

                                                                                                  $1 download

                                                                                                  $6 storage

                                                                                                  $216 cpu

                                                                                                  $2 download

                                                                                                  $9 storage

                                                                                                  AzureMODIS

                                                                                                  Service Web Role Portal

                                                                                                  Total $1420

                                                                                                  bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                                  potential to be important to both large and small scale science problems

                                                                                                  bull Equally import they can increase participation in research providing needed

                                                                                                  resources to userscommunities without ready access

                                                                                                  bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                                  support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                                  latency applications do not perform optimally on clouds today

                                                                                                  bull Provide valuable fault tolerance and scalability abstractions

                                                                                                  bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                                  bull Clouds services to support research provide considerable leverage for both

                                                                                                  individual researchers and entire communities of researchers

                                                                                                  • Day 2 - Azure as a PaaS
                                                                                                  • Day 2 - Applications

                                                                                                    50

                                                                                                    Code ndash AccountInformationcs public class AccountInformation private static string storageKey = ldquotHiSiSnOtMyKeY private static string accountName = jjstore private static StorageCredentialsAccountAndKey credentials internal static StorageCredentialsAccountAndKey Credentials get if (credentials == null) credentials = new StorageCredentialsAccountAndKey(accountName storageKey) return credentials

                                                                                                    51

                                                                                                    Code ndash BlobHelpercs public class BlobHelper private static string defaultContainerName = school private CloudBlobClient client = null private CloudBlobContainer container = null private void InitContainer() if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudBlobClient() container = clientGetContainerReference(defaultContainerName) containerCreateIfNotExist() BlobContainerPermissions permissions = containerGetPermissions() permissionsPublicAccess = BlobContainerPublicAccessTypeContainer containerSetPermissions(permissions)

                                                                                                    52

                                                                                                    Code ndash BlobHelpercs

                                                                                                    public void WriteFileToBlob(string filePath) if (client == null || container == null) InitContainer() FileInfo file = new FileInfo(filePath) CloudBlob blob = containerGetBlobReference(fileName) blobPropertiesContentType = GetContentType(fileExtension) blobUploadFile(fileFullName) Or if you want to write a string replace the last line with blobUploadText(someString) And make sure you set the content type to the appropriate MIME type (eg ldquotextplainrdquo)

                                                                                                    53

                                                                                                    Code ndash BlobHelpercs

                                                                                                    public string GetBlobText(string blobName) if (client == null || container == null) InitContainer() CloudBlob blob = containerGetBlobReference(blobName) try return blobDownloadText() catch (Exception) The blob probably does not exist or there is no connection available return null

                                                                                                    54

                                                                                                    Application Code - Blobs private void SaveToCloudButton_Click(object sender RoutedEventArgs e) StringBuilder buff = new StringBuilder() buffAppendLine(LastNameFirstNameEmailBirthdayNativeLanguageFavoriteIceCreamYearsInPhDGraduated) foreach (AttendeeEntity attendee in attendees) buffAppendLine(attendeeToCsvString()) blobHelperWriteStringToBlob(SummerSchoolAttendeestxt buffToString())

                                                                                                    The blob is now available at httpltAccountNamegtblobcorewindowsnetltContainerNamegtltBlobNamegt Or in this case httpjjstoreblobcorewindowsnetschoolSummerSchoolAttendeestxt

                                                                                                    55

                                                                                                    Code - TableEntities using MicrosoftWindowsAzureStorageClient public class AttendeeEntity TableServiceEntity public string FirstName get set public string LastName get set public string Email get set public DateTime Birthday get set public string FavoriteIceCream get set public int YearsInPhD get set public bool Graduated get set hellip

                                                                                                    56

                                                                                                    Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

                                                                                                    57

                                                                                                    Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

                                                                                                    58

                                                                                                    Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

                                                                                                    59

                                                                                                    Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

                                                                                                    60

                                                                                                    Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

                                                                                                    Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

                                                                                                    61

                                                                                                    Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

                                                                                                    62

                                                                                                    Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

                                                                                                    Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

                                                                                                    63

                                                                                                    Tools ndash Fiddler2

                                                                                                    Best Practices

                                                                                                    Picking the Right VM Size

                                                                                                    bull Having the correct VM size can make a big difference in costs

                                                                                                    bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                                                                                                    bull If you scale better than linear across cores larger VMs could save you money

                                                                                                    bull Pretty rare to see linear scaling across 8 cores

                                                                                                    bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                                                                                                    bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                                                                                                    Using Your VM to the Maximum

                                                                                                    Remember bull 1 role instance == 1 VM running Windows

                                                                                                    bull 1 role instance = one specific task for your code

                                                                                                    bull Yoursquore paying for the entire VM so why not use it

                                                                                                    bull Common mistake ndash split up code into multiple roles each not using up CPU

                                                                                                    bull Balance between using up CPU vs having free capacity in times of need

                                                                                                    bull Multiple ways to use your CPU to the fullest

                                                                                                    Exploiting Concurrency

                                                                                                    bull Spin up additional processes each with a specific task or as a unit of concurrency

                                                                                                    bull May not be ideal if number of active processes exceeds number of cores

                                                                                                    bull Use multithreading aggressively

                                                                                                    bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                                                                                                    bull In NET 4 use the Task Parallel Library

                                                                                                    bull Data parallelism

                                                                                                    bull Task parallelism

                                                                                                    Finding Good Code Neighbors

                                                                                                    bull Typically code falls into one or more of these categories

                                                                                                    bull Find code that is intensive with different resources to live together

                                                                                                    bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                                                                                                    Memory Intensive

                                                                                                    CPU Intensive

                                                                                                    Network IO Intensive

                                                                                                    Storage IO Intensive

                                                                                                    Scaling Appropriately

                                                                                                    bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                                                                                                    bull Spinning VMs up and down automatically is good at large scale

                                                                                                    bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                                                                                                    bull Being too aggressive in spinning down VMs can result in poor user experience

                                                                                                    bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                                                                                                    Performance Cost

                                                                                                    Storage Costs

                                                                                                    bull Understand an applicationrsquos storage profile and how storage billing works

                                                                                                    bull Make service choices based on your app profile

                                                                                                    bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                                                                                                    bull Service choice can make a big cost difference based on your app profile

                                                                                                    bull Caching and compressing They help a lot with storage costs

                                                                                                    Saving Bandwidth Costs

                                                                                                    Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                                                                                    Sending fewer things over the wire often means getting fewer things from storage

                                                                                                    Saving bandwidth costs often lead to savings in other places

                                                                                                    Sending fewer things means your VM has time to do other tasks

                                                                                                    All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                                                                                    Compressing Content

                                                                                                    1 Gzip all output content

                                                                                                    bull All modern browsers can decompress on the fly

                                                                                                    bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                                                                                    2Tradeoff compute costs for storage size

                                                                                                    3Minimize image sizes

                                                                                                    bull Use Portable Network Graphics (PNGs)

                                                                                                    bull Crush your PNGs

                                                                                                    bull Strip needless metadata

                                                                                                    bull Make all PNGs palette PNGs

                                                                                                    Uncompressed

                                                                                                    Content

                                                                                                    Compressed

                                                                                                    Content

                                                                                                    Gzip

                                                                                                    Minify JavaScript

                                                                                                    Minify CCS

                                                                                                    Minify Images

                                                                                                    Best Practices Summary

                                                                                                    Doing lsquolessrsquo is the key to saving costs

                                                                                                    Measure everything

                                                                                                    Know your application profile in and out

                                                                                                    Research Examples in the Cloud

                                                                                                    hellipon another set of slides

                                                                                                    Map Reduce on Azure

                                                                                                    bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                                                                                    bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                                                                                    76

                                                                                                    Project Daytona - Map Reduce on Azure

                                                                                                    httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                                                                                    77

                                                                                                    Questions and Discussionhellip

                                                                                                    Thank you for hosting me at the Summer School

                                                                                                    BLAST (Basic Local Alignment Search Tool)

                                                                                                    bull The most important software in bioinformatics

                                                                                                    bull Identify similarity between bio-sequences

                                                                                                    Computationally intensive

                                                                                                    bull Large number of pairwise alignment operations

                                                                                                    bull A BLAST running can take 700 ~ 1000 CPU hours

                                                                                                    bull Sequence databases growing exponentially

                                                                                                    bull GenBank doubled in size in about 15 months

                                                                                                    It is easy to parallelize BLAST

                                                                                                    bull Segment the input

                                                                                                    bull Segment processing (querying) is pleasingly parallel

                                                                                                    bull Segment the database (eg mpiBLAST)

                                                                                                    bull Needs special result reduction processing

                                                                                                    Large volume data

                                                                                                    bull A normal Blast database can be as large as 10GB

                                                                                                    bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                                                                    bull The output of BLAST is usually 10-100x larger than the input

                                                                                                    bull Parallel BLAST engine on Azure

                                                                                                    bull Query-segmentation data-parallel pattern bull split the input sequences

                                                                                                    bull query partitions in parallel

                                                                                                    bull merge results together when done

                                                                                                    bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                                                                    bull With three special considerations bull Batch job management

                                                                                                    bull Task parallelism on an elastic Cloud

                                                                                                    Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                                                                    on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                                                                    A simple SplitJoin pattern

                                                                                                    Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                                                                    bull 1248 for small middle large and extra large instance size

                                                                                                    Task granularity bull Large partition load imbalance

                                                                                                    bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                                                                    bull Data transferring overhead

                                                                                                    Best Practice test runs to profiling and set size to mitigate the overhead

                                                                                                    Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                                                                    bull too small repeated computation

                                                                                                    bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                                                                    bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                                                                    bull Watch out for the 2-hour maximum limitation

                                                                                                    BLAST task

                                                                                                    Splitting task

                                                                                                    BLAST task

                                                                                                    BLAST task

                                                                                                    BLAST task

                                                                                                    hellip

                                                                                                    Merging Task

                                                                                                    Task size vs Performance

                                                                                                    bull Benefit of the warm cache effect

                                                                                                    bull 100 sequences per partition is the best choice

                                                                                                    Instance size vs Performance

                                                                                                    bull Super-linear speedup with larger size worker instances

                                                                                                    bull Primarily due to the memory capability

                                                                                                    Task SizeInstance Size vs Cost

                                                                                                    bull Extra-large instance generated the best and the most economical throughput

                                                                                                    bull Fully utilize the resource

                                                                                                    Web

                                                                                                    Portal

                                                                                                    Web

                                                                                                    Service

                                                                                                    Job registration

                                                                                                    Job Scheduler

                                                                                                    Worker

                                                                                                    Worker

                                                                                                    Worker

                                                                                                    Global

                                                                                                    dispatch

                                                                                                    queue

                                                                                                    Web Role

                                                                                                    Azure Table

                                                                                                    Job Management Role

                                                                                                    Azure Blob

                                                                                                    Database

                                                                                                    updating Role

                                                                                                    hellip

                                                                                                    Scaling Engine

                                                                                                    Blast databases

                                                                                                    temporary data etc)

                                                                                                    Job Registry NCBI databases

                                                                                                    BLAST task

                                                                                                    Splitting task

                                                                                                    BLAST task

                                                                                                    BLAST task

                                                                                                    BLAST task

                                                                                                    hellip

                                                                                                    Merging Task

                                                                                                    ASPNET program hosted by a web role instance bull Submit jobs

                                                                                                    bull Track jobrsquos status and logs

                                                                                                    AuthenticationAuthorization based on Live ID

                                                                                                    The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                                                                    Web Portal

                                                                                                    Web Service

                                                                                                    Job registration

                                                                                                    Job Scheduler

                                                                                                    Job Portal

                                                                                                    Scaling Engine

                                                                                                    Job Registry

                                                                                                    R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                                                                    Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                                                                    bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                                                                    AzureBLAST significantly saved computing timehellip

                                                                                                    Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                                                                    ldquoAll against Allrdquo query bull The database is also the input query

                                                                                                    bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                                                                    bull Theoretically 100 billion sequence comparisons

                                                                                                    Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                                                                    bull Would require 3216731 minutes (61 years) on one desktop

                                                                                                    One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                                                                    bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                                                    bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                                                    bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                                                    bull Each segment consists of smaller partitions

                                                                                                    bull When load imbalances redistribute the load manually

                                                                                                    5

                                                                                                    0

                                                                                                    62 6

                                                                                                    2 6

                                                                                                    2 6

                                                                                                    2 6

                                                                                                    2 5

                                                                                                    0 62

                                                                                                    bull Total size of the output result is ~230GB

                                                                                                    bull The number of total hits is 1764579487

                                                                                                    bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                                                    bull But based our estimates real working instance time should be 6~8 day

                                                                                                    bull Look into log data to analyze what took placehellip

                                                                                                    5

                                                                                                    0

                                                                                                    62 6

                                                                                                    2 6

                                                                                                    2 6

                                                                                                    2 6

                                                                                                    2 5

                                                                                                    0 62

                                                                                                    A normal log record should be

                                                                                                    Otherwise something is wrong (eg task failed to complete)

                                                                                                    3312010 614 RD00155D3611B0 Executing the task 251523

                                                                                                    3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                                                    3312010 625 RD00155D3611B0 Executing the task 251553

                                                                                                    3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                                                    3312010 644 RD00155D3611B0 Executing the task 251600

                                                                                                    3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                                                    3312010 822 RD00155D3611B0 Executing the task 251774

                                                                                                    3312010 950 RD00155D3611B0 Executing the task 251895

                                                                                                    3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                                                    North Europe Data Center totally 34256 tasks processed

                                                                                                    All 62 compute nodes lost tasks

                                                                                                    and then came back in a group

                                                                                                    This is an Update domain

                                                                                                    ~30 mins

                                                                                                    ~ 6 nodes in one group

                                                                                                    35 Nodes experience blob

                                                                                                    writing failure at same

                                                                                                    time

                                                                                                    West Europe Datacenter 30976 tasks are completed and job was killed

                                                                                                    A reasonable guess the

                                                                                                    Fault Domain is working

                                                                                                    MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                                                    You never miss the water till the well has run dry Irish Proverb

                                                                                                    ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                                                    Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                                                    λv = Latent heat of vaporization (Jg)

                                                                                                    Rn = Net radiation (W m-2)

                                                                                                    cp = Specific heat capacity of air (J kg-1 K-1)

                                                                                                    ρa = dry air density (kg m-3)

                                                                                                    δq = vapor pressure deficit (Pa)

                                                                                                    ga = Conductivity of air (inverse of ra) (m s-1)

                                                                                                    gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                                                    γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                                                    Estimating resistanceconductivity across a

                                                                                                    catchment can be tricky

                                                                                                    bull Lots of inputs big data reduction

                                                                                                    bull Some of the inputs are not so simple

                                                                                                    119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                                                    (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                                                    Penman-Monteith (1964)

                                                                                                    Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                                                    NASA MODIS

                                                                                                    imagery source

                                                                                                    archives

                                                                                                    5 TB (600K files)

                                                                                                    FLUXNET curated

                                                                                                    sensor dataset

                                                                                                    (30GB 960 files)

                                                                                                    FLUXNET curated

                                                                                                    field dataset

                                                                                                    2 KB (1 file)

                                                                                                    NCEPNCAR

                                                                                                    ~100MB

                                                                                                    (4K files)

                                                                                                    Vegetative

                                                                                                    clumping

                                                                                                    ~5MB (1file)

                                                                                                    Climate

                                                                                                    classification

                                                                                                    ~1MB (1file)

                                                                                                    20 US year = 1 global year

                                                                                                    Data collection (map) stage

                                                                                                    bull Downloads requested input tiles

                                                                                                    from NASA ftp sites

                                                                                                    bull Includes geospatial lookup for

                                                                                                    non-sinusoidal tiles that will

                                                                                                    contribute to a reprojected

                                                                                                    sinusoidal tile

                                                                                                    Reprojection (map) stage

                                                                                                    bull Converts source tile(s) to

                                                                                                    intermediate result sinusoidal tiles

                                                                                                    bull Simple nearest neighbor or spline

                                                                                                    algorithms

                                                                                                    Derivation reduction stage

                                                                                                    bull First stage visible to scientist

                                                                                                    bull Computes ET in our initial use

                                                                                                    Analysis reduction stage

                                                                                                    bull Optional second stage visible to

                                                                                                    scientist

                                                                                                    bull Enables production of science

                                                                                                    analysis artifacts such as maps

                                                                                                    tables virtual sensors

                                                                                                    Reduction 1

                                                                                                    Queue

                                                                                                    Source

                                                                                                    Metadata

                                                                                                    AzureMODIS

                                                                                                    Service Web Role Portal

                                                                                                    Request

                                                                                                    Queue

                                                                                                    Scientific

                                                                                                    Results

                                                                                                    Download

                                                                                                    Data Collection Stage

                                                                                                    Source Imagery Download Sites

                                                                                                    Reprojection

                                                                                                    Queue

                                                                                                    Reduction 2

                                                                                                    Queue

                                                                                                    Download

                                                                                                    Queue

                                                                                                    Scientists

                                                                                                    Science results

                                                                                                    Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                    httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                                    bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                                    bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                                    bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                                    recoverable units of work

                                                                                                    bull Execution status of all jobs and tasks persisted in Tables

                                                                                                    ltPipelineStagegt

                                                                                                    Request

                                                                                                    hellip ltPipelineStagegtJobStatus

                                                                                                    Persist ltPipelineStagegtJob Queue

                                                                                                    MODISAzure Service

                                                                                                    (Web Role)

                                                                                                    Service Monitor

                                                                                                    (Worker Role)

                                                                                                    Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                    hellip

                                                                                                    Dispatch

                                                                                                    ltPipelineStagegtTask Queue

                                                                                                    All work actually done by a Worker Role

                                                                                                    bull Sandboxes science or other executable

                                                                                                    bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                                    Service Monitor

                                                                                                    (Worker Role)

                                                                                                    Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                    GenericWorker

                                                                                                    (Worker Role)

                                                                                                    hellip

                                                                                                    hellip

                                                                                                    Dispatch

                                                                                                    ltPipelineStagegtTask Queue

                                                                                                    hellip

                                                                                                    ltInputgtData Storage

                                                                                                    bull Dequeues tasks created by the Service Monitor

                                                                                                    bull Retries failed tasks 3 times

                                                                                                    bull Maintains all task status

                                                                                                    Reprojection Request

                                                                                                    hellip

                                                                                                    Service Monitor

                                                                                                    (Worker Role)

                                                                                                    ReprojectionJobStatus Persist

                                                                                                    Parse amp Persist ReprojectionTaskStatus

                                                                                                    GenericWorker

                                                                                                    (Worker Role)

                                                                                                    hellip

                                                                                                    Job Queue

                                                                                                    hellip

                                                                                                    Dispatch

                                                                                                    Task Queue

                                                                                                    Points to

                                                                                                    hellip

                                                                                                    ScanTimeList

                                                                                                    SwathGranuleMeta

                                                                                                    Reprojection Data

                                                                                                    Storage

                                                                                                    Each entity specifies a

                                                                                                    single reprojection job

                                                                                                    request

                                                                                                    Each entity specifies a

                                                                                                    single reprojection task (ie

                                                                                                    a single tile)

                                                                                                    Query this table to get

                                                                                                    geo-metadata (eg

                                                                                                    boundaries) for each swath

                                                                                                    tile

                                                                                                    Query this table to get the

                                                                                                    list of satellite scan times

                                                                                                    that cover a target tile

                                                                                                    Swath Source

                                                                                                    Data Storage

                                                                                                    bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                                    bull Storage costs driven by data scale and 6 month project duration

                                                                                                    bull Small with respect to the people costs even at graduate student rates

                                                                                                    Reduction 1 Queue

                                                                                                    Source Metadata

                                                                                                    Request Queue

                                                                                                    Scientific Results Download

                                                                                                    Data Collection Stage

                                                                                                    Source Imagery Download Sites

                                                                                                    Reprojection Queue

                                                                                                    Reduction 2 Queue

                                                                                                    Download

                                                                                                    Queue

                                                                                                    Scientists

                                                                                                    Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                    400-500 GB

                                                                                                    60K files

                                                                                                    10 MBsec

                                                                                                    11 hours

                                                                                                    lt10 workers

                                                                                                    $50 upload

                                                                                                    $450 storage

                                                                                                    400 GB

                                                                                                    45K files

                                                                                                    3500 hours

                                                                                                    20-100

                                                                                                    workers

                                                                                                    5-7 GB

                                                                                                    55K files

                                                                                                    1800 hours

                                                                                                    20-100

                                                                                                    workers

                                                                                                    lt10 GB

                                                                                                    ~1K files

                                                                                                    1800 hours

                                                                                                    20-100

                                                                                                    workers

                                                                                                    $420 cpu

                                                                                                    $60 download

                                                                                                    $216 cpu

                                                                                                    $1 download

                                                                                                    $6 storage

                                                                                                    $216 cpu

                                                                                                    $2 download

                                                                                                    $9 storage

                                                                                                    AzureMODIS

                                                                                                    Service Web Role Portal

                                                                                                    Total $1420

                                                                                                    bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                                    potential to be important to both large and small scale science problems

                                                                                                    bull Equally import they can increase participation in research providing needed

                                                                                                    resources to userscommunities without ready access

                                                                                                    bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                                    support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                                    latency applications do not perform optimally on clouds today

                                                                                                    bull Provide valuable fault tolerance and scalability abstractions

                                                                                                    bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                                    bull Clouds services to support research provide considerable leverage for both

                                                                                                    individual researchers and entire communities of researchers

                                                                                                    • Day 2 - Azure as a PaaS
                                                                                                    • Day 2 - Applications

                                                                                                      51

                                                                                                      Code ndash BlobHelpercs public class BlobHelper private static string defaultContainerName = school private CloudBlobClient client = null private CloudBlobContainer container = null private void InitContainer() if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudBlobClient() container = clientGetContainerReference(defaultContainerName) containerCreateIfNotExist() BlobContainerPermissions permissions = containerGetPermissions() permissionsPublicAccess = BlobContainerPublicAccessTypeContainer containerSetPermissions(permissions)

                                                                                                      52

                                                                                                      Code ndash BlobHelpercs

                                                                                                      public void WriteFileToBlob(string filePath) if (client == null || container == null) InitContainer() FileInfo file = new FileInfo(filePath) CloudBlob blob = containerGetBlobReference(fileName) blobPropertiesContentType = GetContentType(fileExtension) blobUploadFile(fileFullName) Or if you want to write a string replace the last line with blobUploadText(someString) And make sure you set the content type to the appropriate MIME type (eg ldquotextplainrdquo)

                                                                                                      53

                                                                                                      Code ndash BlobHelpercs

                                                                                                      public string GetBlobText(string blobName) if (client == null || container == null) InitContainer() CloudBlob blob = containerGetBlobReference(blobName) try return blobDownloadText() catch (Exception) The blob probably does not exist or there is no connection available return null

                                                                                                      54

                                                                                                      Application Code - Blobs private void SaveToCloudButton_Click(object sender RoutedEventArgs e) StringBuilder buff = new StringBuilder() buffAppendLine(LastNameFirstNameEmailBirthdayNativeLanguageFavoriteIceCreamYearsInPhDGraduated) foreach (AttendeeEntity attendee in attendees) buffAppendLine(attendeeToCsvString()) blobHelperWriteStringToBlob(SummerSchoolAttendeestxt buffToString())

                                                                                                      The blob is now available at httpltAccountNamegtblobcorewindowsnetltContainerNamegtltBlobNamegt Or in this case httpjjstoreblobcorewindowsnetschoolSummerSchoolAttendeestxt

                                                                                                      55

                                                                                                      Code - TableEntities using MicrosoftWindowsAzureStorageClient public class AttendeeEntity TableServiceEntity public string FirstName get set public string LastName get set public string Email get set public DateTime Birthday get set public string FavoriteIceCream get set public int YearsInPhD get set public bool Graduated get set hellip

                                                                                                      56

                                                                                                      Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

                                                                                                      57

                                                                                                      Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

                                                                                                      58

                                                                                                      Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

                                                                                                      59

                                                                                                      Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

                                                                                                      60

                                                                                                      Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

                                                                                                      Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

                                                                                                      61

                                                                                                      Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

                                                                                                      62

                                                                                                      Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

                                                                                                      Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

                                                                                                      63

                                                                                                      Tools ndash Fiddler2

                                                                                                      Best Practices

                                                                                                      Picking the Right VM Size

                                                                                                      bull Having the correct VM size can make a big difference in costs

                                                                                                      bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                                                                                                      bull If you scale better than linear across cores larger VMs could save you money

                                                                                                      bull Pretty rare to see linear scaling across 8 cores

                                                                                                      bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                                                                                                      bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                                                                                                      Using Your VM to the Maximum

                                                                                                      Remember bull 1 role instance == 1 VM running Windows

                                                                                                      bull 1 role instance = one specific task for your code

                                                                                                      bull Yoursquore paying for the entire VM so why not use it

                                                                                                      bull Common mistake ndash split up code into multiple roles each not using up CPU

                                                                                                      bull Balance between using up CPU vs having free capacity in times of need

                                                                                                      bull Multiple ways to use your CPU to the fullest

                                                                                                      Exploiting Concurrency

                                                                                                      bull Spin up additional processes each with a specific task or as a unit of concurrency

                                                                                                      bull May not be ideal if number of active processes exceeds number of cores

                                                                                                      bull Use multithreading aggressively

                                                                                                      bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                                                                                                      bull In NET 4 use the Task Parallel Library

                                                                                                      bull Data parallelism

                                                                                                      bull Task parallelism

                                                                                                      Finding Good Code Neighbors

                                                                                                      bull Typically code falls into one or more of these categories

                                                                                                      bull Find code that is intensive with different resources to live together

                                                                                                      bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                                                                                                      Memory Intensive

                                                                                                      CPU Intensive

                                                                                                      Network IO Intensive

                                                                                                      Storage IO Intensive

                                                                                                      Scaling Appropriately

                                                                                                      bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                                                                                                      bull Spinning VMs up and down automatically is good at large scale

                                                                                                      bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                                                                                                      bull Being too aggressive in spinning down VMs can result in poor user experience

                                                                                                      bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                                                                                                      Performance Cost

                                                                                                      Storage Costs

                                                                                                      bull Understand an applicationrsquos storage profile and how storage billing works

                                                                                                      bull Make service choices based on your app profile

                                                                                                      bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                                                                                                      bull Service choice can make a big cost difference based on your app profile

                                                                                                      bull Caching and compressing They help a lot with storage costs

                                                                                                      Saving Bandwidth Costs

                                                                                                      Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                                                                                      Sending fewer things over the wire often means getting fewer things from storage

                                                                                                      Saving bandwidth costs often lead to savings in other places

                                                                                                      Sending fewer things means your VM has time to do other tasks

                                                                                                      All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                                                                                      Compressing Content

                                                                                                      1 Gzip all output content

                                                                                                      bull All modern browsers can decompress on the fly

                                                                                                      bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                                                                                      2Tradeoff compute costs for storage size

                                                                                                      3Minimize image sizes

                                                                                                      bull Use Portable Network Graphics (PNGs)

                                                                                                      bull Crush your PNGs

                                                                                                      bull Strip needless metadata

                                                                                                      bull Make all PNGs palette PNGs

                                                                                                      Uncompressed

                                                                                                      Content

                                                                                                      Compressed

                                                                                                      Content

                                                                                                      Gzip

                                                                                                      Minify JavaScript

                                                                                                      Minify CCS

                                                                                                      Minify Images

                                                                                                      Best Practices Summary

                                                                                                      Doing lsquolessrsquo is the key to saving costs

                                                                                                      Measure everything

                                                                                                      Know your application profile in and out

                                                                                                      Research Examples in the Cloud

                                                                                                      hellipon another set of slides

                                                                                                      Map Reduce on Azure

                                                                                                      bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                                                                                      bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                                                                                      76

                                                                                                      Project Daytona - Map Reduce on Azure

                                                                                                      httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                                                                                      77

                                                                                                      Questions and Discussionhellip

                                                                                                      Thank you for hosting me at the Summer School

                                                                                                      BLAST (Basic Local Alignment Search Tool)

                                                                                                      bull The most important software in bioinformatics

                                                                                                      bull Identify similarity between bio-sequences

                                                                                                      Computationally intensive

                                                                                                      bull Large number of pairwise alignment operations

                                                                                                      bull A BLAST running can take 700 ~ 1000 CPU hours

                                                                                                      bull Sequence databases growing exponentially

                                                                                                      bull GenBank doubled in size in about 15 months

                                                                                                      It is easy to parallelize BLAST

                                                                                                      bull Segment the input

                                                                                                      bull Segment processing (querying) is pleasingly parallel

                                                                                                      bull Segment the database (eg mpiBLAST)

                                                                                                      bull Needs special result reduction processing

                                                                                                      Large volume data

                                                                                                      bull A normal Blast database can be as large as 10GB

                                                                                                      bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                                                                      bull The output of BLAST is usually 10-100x larger than the input

                                                                                                      bull Parallel BLAST engine on Azure

                                                                                                      bull Query-segmentation data-parallel pattern bull split the input sequences

                                                                                                      bull query partitions in parallel

                                                                                                      bull merge results together when done

                                                                                                      bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                                                                      bull With three special considerations bull Batch job management

                                                                                                      bull Task parallelism on an elastic Cloud

                                                                                                      Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                                                                      on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                                                                      A simple SplitJoin pattern

                                                                                                      Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                                                                      bull 1248 for small middle large and extra large instance size

                                                                                                      Task granularity bull Large partition load imbalance

                                                                                                      bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                                                                      bull Data transferring overhead

                                                                                                      Best Practice test runs to profiling and set size to mitigate the overhead

                                                                                                      Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                                                                      bull too small repeated computation

                                                                                                      bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                                                                      bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                                                                      bull Watch out for the 2-hour maximum limitation

                                                                                                      BLAST task

                                                                                                      Splitting task

                                                                                                      BLAST task

                                                                                                      BLAST task

                                                                                                      BLAST task

                                                                                                      hellip

                                                                                                      Merging Task

                                                                                                      Task size vs Performance

                                                                                                      bull Benefit of the warm cache effect

                                                                                                      bull 100 sequences per partition is the best choice

                                                                                                      Instance size vs Performance

                                                                                                      bull Super-linear speedup with larger size worker instances

                                                                                                      bull Primarily due to the memory capability

                                                                                                      Task SizeInstance Size vs Cost

                                                                                                      bull Extra-large instance generated the best and the most economical throughput

                                                                                                      bull Fully utilize the resource

                                                                                                      Web

                                                                                                      Portal

                                                                                                      Web

                                                                                                      Service

                                                                                                      Job registration

                                                                                                      Job Scheduler

                                                                                                      Worker

                                                                                                      Worker

                                                                                                      Worker

                                                                                                      Global

                                                                                                      dispatch

                                                                                                      queue

                                                                                                      Web Role

                                                                                                      Azure Table

                                                                                                      Job Management Role

                                                                                                      Azure Blob

                                                                                                      Database

                                                                                                      updating Role

                                                                                                      hellip

                                                                                                      Scaling Engine

                                                                                                      Blast databases

                                                                                                      temporary data etc)

                                                                                                      Job Registry NCBI databases

                                                                                                      BLAST task

                                                                                                      Splitting task

                                                                                                      BLAST task

                                                                                                      BLAST task

                                                                                                      BLAST task

                                                                                                      hellip

                                                                                                      Merging Task

                                                                                                      ASPNET program hosted by a web role instance bull Submit jobs

                                                                                                      bull Track jobrsquos status and logs

                                                                                                      AuthenticationAuthorization based on Live ID

                                                                                                      The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                                                                      Web Portal

                                                                                                      Web Service

                                                                                                      Job registration

                                                                                                      Job Scheduler

                                                                                                      Job Portal

                                                                                                      Scaling Engine

                                                                                                      Job Registry

                                                                                                      R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                                                                      Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                                                                      bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                                                                      AzureBLAST significantly saved computing timehellip

                                                                                                      Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                                                                      ldquoAll against Allrdquo query bull The database is also the input query

                                                                                                      bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                                                                      bull Theoretically 100 billion sequence comparisons

                                                                                                      Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                                                                      bull Would require 3216731 minutes (61 years) on one desktop

                                                                                                      One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                                                                      bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                                                      bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                                                      bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                                                      bull Each segment consists of smaller partitions

                                                                                                      bull When load imbalances redistribute the load manually

                                                                                                      5

                                                                                                      0

                                                                                                      62 6

                                                                                                      2 6

                                                                                                      2 6

                                                                                                      2 6

                                                                                                      2 5

                                                                                                      0 62

                                                                                                      bull Total size of the output result is ~230GB

                                                                                                      bull The number of total hits is 1764579487

                                                                                                      bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                                                      bull But based our estimates real working instance time should be 6~8 day

                                                                                                      bull Look into log data to analyze what took placehellip

                                                                                                      5

                                                                                                      0

                                                                                                      62 6

                                                                                                      2 6

                                                                                                      2 6

                                                                                                      2 6

                                                                                                      2 5

                                                                                                      0 62

                                                                                                      A normal log record should be

                                                                                                      Otherwise something is wrong (eg task failed to complete)

                                                                                                      3312010 614 RD00155D3611B0 Executing the task 251523

                                                                                                      3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                                                      3312010 625 RD00155D3611B0 Executing the task 251553

                                                                                                      3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                                                      3312010 644 RD00155D3611B0 Executing the task 251600

                                                                                                      3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                                                      3312010 822 RD00155D3611B0 Executing the task 251774

                                                                                                      3312010 950 RD00155D3611B0 Executing the task 251895

                                                                                                      3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                                                      North Europe Data Center totally 34256 tasks processed

                                                                                                      All 62 compute nodes lost tasks

                                                                                                      and then came back in a group

                                                                                                      This is an Update domain

                                                                                                      ~30 mins

                                                                                                      ~ 6 nodes in one group

                                                                                                      35 Nodes experience blob

                                                                                                      writing failure at same

                                                                                                      time

                                                                                                      West Europe Datacenter 30976 tasks are completed and job was killed

                                                                                                      A reasonable guess the

                                                                                                      Fault Domain is working

                                                                                                      MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                                                      You never miss the water till the well has run dry Irish Proverb

                                                                                                      ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                                                      Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                                                      λv = Latent heat of vaporization (Jg)

                                                                                                      Rn = Net radiation (W m-2)

                                                                                                      cp = Specific heat capacity of air (J kg-1 K-1)

                                                                                                      ρa = dry air density (kg m-3)

                                                                                                      δq = vapor pressure deficit (Pa)

                                                                                                      ga = Conductivity of air (inverse of ra) (m s-1)

                                                                                                      gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                                                      γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                                                      Estimating resistanceconductivity across a

                                                                                                      catchment can be tricky

                                                                                                      bull Lots of inputs big data reduction

                                                                                                      bull Some of the inputs are not so simple

                                                                                                      119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                                                      (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                                                      Penman-Monteith (1964)

                                                                                                      Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                                                      NASA MODIS

                                                                                                      imagery source

                                                                                                      archives

                                                                                                      5 TB (600K files)

                                                                                                      FLUXNET curated

                                                                                                      sensor dataset

                                                                                                      (30GB 960 files)

                                                                                                      FLUXNET curated

                                                                                                      field dataset

                                                                                                      2 KB (1 file)

                                                                                                      NCEPNCAR

                                                                                                      ~100MB

                                                                                                      (4K files)

                                                                                                      Vegetative

                                                                                                      clumping

                                                                                                      ~5MB (1file)

                                                                                                      Climate

                                                                                                      classification

                                                                                                      ~1MB (1file)

                                                                                                      20 US year = 1 global year

                                                                                                      Data collection (map) stage

                                                                                                      bull Downloads requested input tiles

                                                                                                      from NASA ftp sites

                                                                                                      bull Includes geospatial lookup for

                                                                                                      non-sinusoidal tiles that will

                                                                                                      contribute to a reprojected

                                                                                                      sinusoidal tile

                                                                                                      Reprojection (map) stage

                                                                                                      bull Converts source tile(s) to

                                                                                                      intermediate result sinusoidal tiles

                                                                                                      bull Simple nearest neighbor or spline

                                                                                                      algorithms

                                                                                                      Derivation reduction stage

                                                                                                      bull First stage visible to scientist

                                                                                                      bull Computes ET in our initial use

                                                                                                      Analysis reduction stage

                                                                                                      bull Optional second stage visible to

                                                                                                      scientist

                                                                                                      bull Enables production of science

                                                                                                      analysis artifacts such as maps

                                                                                                      tables virtual sensors

                                                                                                      Reduction 1

                                                                                                      Queue

                                                                                                      Source

                                                                                                      Metadata

                                                                                                      AzureMODIS

                                                                                                      Service Web Role Portal

                                                                                                      Request

                                                                                                      Queue

                                                                                                      Scientific

                                                                                                      Results

                                                                                                      Download

                                                                                                      Data Collection Stage

                                                                                                      Source Imagery Download Sites

                                                                                                      Reprojection

                                                                                                      Queue

                                                                                                      Reduction 2

                                                                                                      Queue

                                                                                                      Download

                                                                                                      Queue

                                                                                                      Scientists

                                                                                                      Science results

                                                                                                      Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                      httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                                      bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                                      bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                                      bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                                      recoverable units of work

                                                                                                      bull Execution status of all jobs and tasks persisted in Tables

                                                                                                      ltPipelineStagegt

                                                                                                      Request

                                                                                                      hellip ltPipelineStagegtJobStatus

                                                                                                      Persist ltPipelineStagegtJob Queue

                                                                                                      MODISAzure Service

                                                                                                      (Web Role)

                                                                                                      Service Monitor

                                                                                                      (Worker Role)

                                                                                                      Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                      hellip

                                                                                                      Dispatch

                                                                                                      ltPipelineStagegtTask Queue

                                                                                                      All work actually done by a Worker Role

                                                                                                      bull Sandboxes science or other executable

                                                                                                      bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                                      Service Monitor

                                                                                                      (Worker Role)

                                                                                                      Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                      GenericWorker

                                                                                                      (Worker Role)

                                                                                                      hellip

                                                                                                      hellip

                                                                                                      Dispatch

                                                                                                      ltPipelineStagegtTask Queue

                                                                                                      hellip

                                                                                                      ltInputgtData Storage

                                                                                                      bull Dequeues tasks created by the Service Monitor

                                                                                                      bull Retries failed tasks 3 times

                                                                                                      bull Maintains all task status

                                                                                                      Reprojection Request

                                                                                                      hellip

                                                                                                      Service Monitor

                                                                                                      (Worker Role)

                                                                                                      ReprojectionJobStatus Persist

                                                                                                      Parse amp Persist ReprojectionTaskStatus

                                                                                                      GenericWorker

                                                                                                      (Worker Role)

                                                                                                      hellip

                                                                                                      Job Queue

                                                                                                      hellip

                                                                                                      Dispatch

                                                                                                      Task Queue

                                                                                                      Points to

                                                                                                      hellip

                                                                                                      ScanTimeList

                                                                                                      SwathGranuleMeta

                                                                                                      Reprojection Data

                                                                                                      Storage

                                                                                                      Each entity specifies a

                                                                                                      single reprojection job

                                                                                                      request

                                                                                                      Each entity specifies a

                                                                                                      single reprojection task (ie

                                                                                                      a single tile)

                                                                                                      Query this table to get

                                                                                                      geo-metadata (eg

                                                                                                      boundaries) for each swath

                                                                                                      tile

                                                                                                      Query this table to get the

                                                                                                      list of satellite scan times

                                                                                                      that cover a target tile

                                                                                                      Swath Source

                                                                                                      Data Storage

                                                                                                      bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                                      bull Storage costs driven by data scale and 6 month project duration

                                                                                                      bull Small with respect to the people costs even at graduate student rates

                                                                                                      Reduction 1 Queue

                                                                                                      Source Metadata

                                                                                                      Request Queue

                                                                                                      Scientific Results Download

                                                                                                      Data Collection Stage

                                                                                                      Source Imagery Download Sites

                                                                                                      Reprojection Queue

                                                                                                      Reduction 2 Queue

                                                                                                      Download

                                                                                                      Queue

                                                                                                      Scientists

                                                                                                      Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                      400-500 GB

                                                                                                      60K files

                                                                                                      10 MBsec

                                                                                                      11 hours

                                                                                                      lt10 workers

                                                                                                      $50 upload

                                                                                                      $450 storage

                                                                                                      400 GB

                                                                                                      45K files

                                                                                                      3500 hours

                                                                                                      20-100

                                                                                                      workers

                                                                                                      5-7 GB

                                                                                                      55K files

                                                                                                      1800 hours

                                                                                                      20-100

                                                                                                      workers

                                                                                                      lt10 GB

                                                                                                      ~1K files

                                                                                                      1800 hours

                                                                                                      20-100

                                                                                                      workers

                                                                                                      $420 cpu

                                                                                                      $60 download

                                                                                                      $216 cpu

                                                                                                      $1 download

                                                                                                      $6 storage

                                                                                                      $216 cpu

                                                                                                      $2 download

                                                                                                      $9 storage

                                                                                                      AzureMODIS

                                                                                                      Service Web Role Portal

                                                                                                      Total $1420

                                                                                                      bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                                      potential to be important to both large and small scale science problems

                                                                                                      bull Equally import they can increase participation in research providing needed

                                                                                                      resources to userscommunities without ready access

                                                                                                      bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                                      support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                                      latency applications do not perform optimally on clouds today

                                                                                                      bull Provide valuable fault tolerance and scalability abstractions

                                                                                                      bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                                      bull Clouds services to support research provide considerable leverage for both

                                                                                                      individual researchers and entire communities of researchers

                                                                                                      • Day 2 - Azure as a PaaS
                                                                                                      • Day 2 - Applications

                                                                                                        52

                                                                                                        Code ndash BlobHelpercs

                                                                                                        public void WriteFileToBlob(string filePath) if (client == null || container == null) InitContainer() FileInfo file = new FileInfo(filePath) CloudBlob blob = containerGetBlobReference(fileName) blobPropertiesContentType = GetContentType(fileExtension) blobUploadFile(fileFullName) Or if you want to write a string replace the last line with blobUploadText(someString) And make sure you set the content type to the appropriate MIME type (eg ldquotextplainrdquo)

                                                                                                        53

                                                                                                        Code ndash BlobHelpercs

                                                                                                        public string GetBlobText(string blobName) if (client == null || container == null) InitContainer() CloudBlob blob = containerGetBlobReference(blobName) try return blobDownloadText() catch (Exception) The blob probably does not exist or there is no connection available return null

                                                                                                        54

                                                                                                        Application Code - Blobs private void SaveToCloudButton_Click(object sender RoutedEventArgs e) StringBuilder buff = new StringBuilder() buffAppendLine(LastNameFirstNameEmailBirthdayNativeLanguageFavoriteIceCreamYearsInPhDGraduated) foreach (AttendeeEntity attendee in attendees) buffAppendLine(attendeeToCsvString()) blobHelperWriteStringToBlob(SummerSchoolAttendeestxt buffToString())

                                                                                                        The blob is now available at httpltAccountNamegtblobcorewindowsnetltContainerNamegtltBlobNamegt Or in this case httpjjstoreblobcorewindowsnetschoolSummerSchoolAttendeestxt

                                                                                                        55

                                                                                                        Code - TableEntities using MicrosoftWindowsAzureStorageClient public class AttendeeEntity TableServiceEntity public string FirstName get set public string LastName get set public string Email get set public DateTime Birthday get set public string FavoriteIceCream get set public int YearsInPhD get set public bool Graduated get set hellip

                                                                                                        56

                                                                                                        Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

                                                                                                        57

                                                                                                        Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

                                                                                                        58

                                                                                                        Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

                                                                                                        59

                                                                                                        Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

                                                                                                        60

                                                                                                        Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

                                                                                                        Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

                                                                                                        61

                                                                                                        Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

                                                                                                        62

                                                                                                        Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

                                                                                                        Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

                                                                                                        63

                                                                                                        Tools ndash Fiddler2

                                                                                                        Best Practices

                                                                                                        Picking the Right VM Size

                                                                                                        bull Having the correct VM size can make a big difference in costs

                                                                                                        bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                                                                                                        bull If you scale better than linear across cores larger VMs could save you money

                                                                                                        bull Pretty rare to see linear scaling across 8 cores

                                                                                                        bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                                                                                                        bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                                                                                                        Using Your VM to the Maximum

                                                                                                        Remember bull 1 role instance == 1 VM running Windows

                                                                                                        bull 1 role instance = one specific task for your code

                                                                                                        bull Yoursquore paying for the entire VM so why not use it

                                                                                                        bull Common mistake ndash split up code into multiple roles each not using up CPU

                                                                                                        bull Balance between using up CPU vs having free capacity in times of need

                                                                                                        bull Multiple ways to use your CPU to the fullest

                                                                                                        Exploiting Concurrency

                                                                                                        bull Spin up additional processes each with a specific task or as a unit of concurrency

                                                                                                        bull May not be ideal if number of active processes exceeds number of cores

                                                                                                        bull Use multithreading aggressively

                                                                                                        bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                                                                                                        bull In NET 4 use the Task Parallel Library

                                                                                                        bull Data parallelism

                                                                                                        bull Task parallelism

                                                                                                        Finding Good Code Neighbors

                                                                                                        bull Typically code falls into one or more of these categories

                                                                                                        bull Find code that is intensive with different resources to live together

                                                                                                        bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                                                                                                        Memory Intensive

                                                                                                        CPU Intensive

                                                                                                        Network IO Intensive

                                                                                                        Storage IO Intensive

                                                                                                        Scaling Appropriately

                                                                                                        bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                                                                                                        bull Spinning VMs up and down automatically is good at large scale

                                                                                                        bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                                                                                                        bull Being too aggressive in spinning down VMs can result in poor user experience

                                                                                                        bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                                                                                                        Performance Cost

                                                                                                        Storage Costs

                                                                                                        bull Understand an applicationrsquos storage profile and how storage billing works

                                                                                                        bull Make service choices based on your app profile

                                                                                                        bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                                                                                                        bull Service choice can make a big cost difference based on your app profile

                                                                                                        bull Caching and compressing They help a lot with storage costs

                                                                                                        Saving Bandwidth Costs

                                                                                                        Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                                                                                        Sending fewer things over the wire often means getting fewer things from storage

                                                                                                        Saving bandwidth costs often lead to savings in other places

                                                                                                        Sending fewer things means your VM has time to do other tasks

                                                                                                        All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                                                                                        Compressing Content

                                                                                                        1 Gzip all output content

                                                                                                        bull All modern browsers can decompress on the fly

                                                                                                        bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                                                                                        2Tradeoff compute costs for storage size

                                                                                                        3Minimize image sizes

                                                                                                        bull Use Portable Network Graphics (PNGs)

                                                                                                        bull Crush your PNGs

                                                                                                        bull Strip needless metadata

                                                                                                        bull Make all PNGs palette PNGs

                                                                                                        Uncompressed

                                                                                                        Content

                                                                                                        Compressed

                                                                                                        Content

                                                                                                        Gzip

                                                                                                        Minify JavaScript

                                                                                                        Minify CCS

                                                                                                        Minify Images

                                                                                                        Best Practices Summary

                                                                                                        Doing lsquolessrsquo is the key to saving costs

                                                                                                        Measure everything

                                                                                                        Know your application profile in and out

                                                                                                        Research Examples in the Cloud

                                                                                                        hellipon another set of slides

                                                                                                        Map Reduce on Azure

                                                                                                        bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                                                                                        bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                                                                                        76

                                                                                                        Project Daytona - Map Reduce on Azure

                                                                                                        httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                                                                                        77

                                                                                                        Questions and Discussionhellip

                                                                                                        Thank you for hosting me at the Summer School

                                                                                                        BLAST (Basic Local Alignment Search Tool)

                                                                                                        bull The most important software in bioinformatics

                                                                                                        bull Identify similarity between bio-sequences

                                                                                                        Computationally intensive

                                                                                                        bull Large number of pairwise alignment operations

                                                                                                        bull A BLAST running can take 700 ~ 1000 CPU hours

                                                                                                        bull Sequence databases growing exponentially

                                                                                                        bull GenBank doubled in size in about 15 months

                                                                                                        It is easy to parallelize BLAST

                                                                                                        bull Segment the input

                                                                                                        bull Segment processing (querying) is pleasingly parallel

                                                                                                        bull Segment the database (eg mpiBLAST)

                                                                                                        bull Needs special result reduction processing

                                                                                                        Large volume data

                                                                                                        bull A normal Blast database can be as large as 10GB

                                                                                                        bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                                                                        bull The output of BLAST is usually 10-100x larger than the input

                                                                                                        bull Parallel BLAST engine on Azure

                                                                                                        bull Query-segmentation data-parallel pattern bull split the input sequences

                                                                                                        bull query partitions in parallel

                                                                                                        bull merge results together when done

                                                                                                        bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                                                                        bull With three special considerations bull Batch job management

                                                                                                        bull Task parallelism on an elastic Cloud

                                                                                                        Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                                                                        on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                                                                        A simple SplitJoin pattern

                                                                                                        Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                                                                        bull 1248 for small middle large and extra large instance size

                                                                                                        Task granularity bull Large partition load imbalance

                                                                                                        bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                                                                        bull Data transferring overhead

                                                                                                        Best Practice test runs to profiling and set size to mitigate the overhead

                                                                                                        Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                                                                        bull too small repeated computation

                                                                                                        bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                                                                        bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                                                                        bull Watch out for the 2-hour maximum limitation

                                                                                                        BLAST task

                                                                                                        Splitting task

                                                                                                        BLAST task

                                                                                                        BLAST task

                                                                                                        BLAST task

                                                                                                        hellip

                                                                                                        Merging Task

                                                                                                        Task size vs Performance

                                                                                                        bull Benefit of the warm cache effect

                                                                                                        bull 100 sequences per partition is the best choice

                                                                                                        Instance size vs Performance

                                                                                                        bull Super-linear speedup with larger size worker instances

                                                                                                        bull Primarily due to the memory capability

                                                                                                        Task SizeInstance Size vs Cost

                                                                                                        bull Extra-large instance generated the best and the most economical throughput

                                                                                                        bull Fully utilize the resource

                                                                                                        Web

                                                                                                        Portal

                                                                                                        Web

                                                                                                        Service

                                                                                                        Job registration

                                                                                                        Job Scheduler

                                                                                                        Worker

                                                                                                        Worker

                                                                                                        Worker

                                                                                                        Global

                                                                                                        dispatch

                                                                                                        queue

                                                                                                        Web Role

                                                                                                        Azure Table

                                                                                                        Job Management Role

                                                                                                        Azure Blob

                                                                                                        Database

                                                                                                        updating Role

                                                                                                        hellip

                                                                                                        Scaling Engine

                                                                                                        Blast databases

                                                                                                        temporary data etc)

                                                                                                        Job Registry NCBI databases

                                                                                                        BLAST task

                                                                                                        Splitting task

                                                                                                        BLAST task

                                                                                                        BLAST task

                                                                                                        BLAST task

                                                                                                        hellip

                                                                                                        Merging Task

                                                                                                        ASPNET program hosted by a web role instance bull Submit jobs

                                                                                                        bull Track jobrsquos status and logs

                                                                                                        AuthenticationAuthorization based on Live ID

                                                                                                        The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                                                                        Web Portal

                                                                                                        Web Service

                                                                                                        Job registration

                                                                                                        Job Scheduler

                                                                                                        Job Portal

                                                                                                        Scaling Engine

                                                                                                        Job Registry

                                                                                                        R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                                                                        Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                                                                        bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                                                                        AzureBLAST significantly saved computing timehellip

                                                                                                        Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                                                                        ldquoAll against Allrdquo query bull The database is also the input query

                                                                                                        bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                                                                        bull Theoretically 100 billion sequence comparisons

                                                                                                        Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                                                                        bull Would require 3216731 minutes (61 years) on one desktop

                                                                                                        One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                                                                        bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                                                        bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                                                        bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                                                        bull Each segment consists of smaller partitions

                                                                                                        bull When load imbalances redistribute the load manually

                                                                                                        5

                                                                                                        0

                                                                                                        62 6

                                                                                                        2 6

                                                                                                        2 6

                                                                                                        2 6

                                                                                                        2 5

                                                                                                        0 62

                                                                                                        bull Total size of the output result is ~230GB

                                                                                                        bull The number of total hits is 1764579487

                                                                                                        bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                                                        bull But based our estimates real working instance time should be 6~8 day

                                                                                                        bull Look into log data to analyze what took placehellip

                                                                                                        5

                                                                                                        0

                                                                                                        62 6

                                                                                                        2 6

                                                                                                        2 6

                                                                                                        2 6

                                                                                                        2 5

                                                                                                        0 62

                                                                                                        A normal log record should be

                                                                                                        Otherwise something is wrong (eg task failed to complete)

                                                                                                        3312010 614 RD00155D3611B0 Executing the task 251523

                                                                                                        3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                                                        3312010 625 RD00155D3611B0 Executing the task 251553

                                                                                                        3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                                                        3312010 644 RD00155D3611B0 Executing the task 251600

                                                                                                        3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                                                        3312010 822 RD00155D3611B0 Executing the task 251774

                                                                                                        3312010 950 RD00155D3611B0 Executing the task 251895

                                                                                                        3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                                                        North Europe Data Center totally 34256 tasks processed

                                                                                                        All 62 compute nodes lost tasks

                                                                                                        and then came back in a group

                                                                                                        This is an Update domain

                                                                                                        ~30 mins

                                                                                                        ~ 6 nodes in one group

                                                                                                        35 Nodes experience blob

                                                                                                        writing failure at same

                                                                                                        time

                                                                                                        West Europe Datacenter 30976 tasks are completed and job was killed

                                                                                                        A reasonable guess the

                                                                                                        Fault Domain is working

                                                                                                        MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                                                        You never miss the water till the well has run dry Irish Proverb

                                                                                                        ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                                                        Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                                                        λv = Latent heat of vaporization (Jg)

                                                                                                        Rn = Net radiation (W m-2)

                                                                                                        cp = Specific heat capacity of air (J kg-1 K-1)

                                                                                                        ρa = dry air density (kg m-3)

                                                                                                        δq = vapor pressure deficit (Pa)

                                                                                                        ga = Conductivity of air (inverse of ra) (m s-1)

                                                                                                        gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                                                        γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                                                        Estimating resistanceconductivity across a

                                                                                                        catchment can be tricky

                                                                                                        bull Lots of inputs big data reduction

                                                                                                        bull Some of the inputs are not so simple

                                                                                                        119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                                                        (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                                                        Penman-Monteith (1964)

                                                                                                        Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                                                        NASA MODIS

                                                                                                        imagery source

                                                                                                        archives

                                                                                                        5 TB (600K files)

                                                                                                        FLUXNET curated

                                                                                                        sensor dataset

                                                                                                        (30GB 960 files)

                                                                                                        FLUXNET curated

                                                                                                        field dataset

                                                                                                        2 KB (1 file)

                                                                                                        NCEPNCAR

                                                                                                        ~100MB

                                                                                                        (4K files)

                                                                                                        Vegetative

                                                                                                        clumping

                                                                                                        ~5MB (1file)

                                                                                                        Climate

                                                                                                        classification

                                                                                                        ~1MB (1file)

                                                                                                        20 US year = 1 global year

                                                                                                        Data collection (map) stage

                                                                                                        bull Downloads requested input tiles

                                                                                                        from NASA ftp sites

                                                                                                        bull Includes geospatial lookup for

                                                                                                        non-sinusoidal tiles that will

                                                                                                        contribute to a reprojected

                                                                                                        sinusoidal tile

                                                                                                        Reprojection (map) stage

                                                                                                        bull Converts source tile(s) to

                                                                                                        intermediate result sinusoidal tiles

                                                                                                        bull Simple nearest neighbor or spline

                                                                                                        algorithms

                                                                                                        Derivation reduction stage

                                                                                                        bull First stage visible to scientist

                                                                                                        bull Computes ET in our initial use

                                                                                                        Analysis reduction stage

                                                                                                        bull Optional second stage visible to

                                                                                                        scientist

                                                                                                        bull Enables production of science

                                                                                                        analysis artifacts such as maps

                                                                                                        tables virtual sensors

                                                                                                        Reduction 1

                                                                                                        Queue

                                                                                                        Source

                                                                                                        Metadata

                                                                                                        AzureMODIS

                                                                                                        Service Web Role Portal

                                                                                                        Request

                                                                                                        Queue

                                                                                                        Scientific

                                                                                                        Results

                                                                                                        Download

                                                                                                        Data Collection Stage

                                                                                                        Source Imagery Download Sites

                                                                                                        Reprojection

                                                                                                        Queue

                                                                                                        Reduction 2

                                                                                                        Queue

                                                                                                        Download

                                                                                                        Queue

                                                                                                        Scientists

                                                                                                        Science results

                                                                                                        Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                        httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                                        bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                                        bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                                        bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                                        recoverable units of work

                                                                                                        bull Execution status of all jobs and tasks persisted in Tables

                                                                                                        ltPipelineStagegt

                                                                                                        Request

                                                                                                        hellip ltPipelineStagegtJobStatus

                                                                                                        Persist ltPipelineStagegtJob Queue

                                                                                                        MODISAzure Service

                                                                                                        (Web Role)

                                                                                                        Service Monitor

                                                                                                        (Worker Role)

                                                                                                        Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                        hellip

                                                                                                        Dispatch

                                                                                                        ltPipelineStagegtTask Queue

                                                                                                        All work actually done by a Worker Role

                                                                                                        bull Sandboxes science or other executable

                                                                                                        bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                                        Service Monitor

                                                                                                        (Worker Role)

                                                                                                        Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                        GenericWorker

                                                                                                        (Worker Role)

                                                                                                        hellip

                                                                                                        hellip

                                                                                                        Dispatch

                                                                                                        ltPipelineStagegtTask Queue

                                                                                                        hellip

                                                                                                        ltInputgtData Storage

                                                                                                        bull Dequeues tasks created by the Service Monitor

                                                                                                        bull Retries failed tasks 3 times

                                                                                                        bull Maintains all task status

                                                                                                        Reprojection Request

                                                                                                        hellip

                                                                                                        Service Monitor

                                                                                                        (Worker Role)

                                                                                                        ReprojectionJobStatus Persist

                                                                                                        Parse amp Persist ReprojectionTaskStatus

                                                                                                        GenericWorker

                                                                                                        (Worker Role)

                                                                                                        hellip

                                                                                                        Job Queue

                                                                                                        hellip

                                                                                                        Dispatch

                                                                                                        Task Queue

                                                                                                        Points to

                                                                                                        hellip

                                                                                                        ScanTimeList

                                                                                                        SwathGranuleMeta

                                                                                                        Reprojection Data

                                                                                                        Storage

                                                                                                        Each entity specifies a

                                                                                                        single reprojection job

                                                                                                        request

                                                                                                        Each entity specifies a

                                                                                                        single reprojection task (ie

                                                                                                        a single tile)

                                                                                                        Query this table to get

                                                                                                        geo-metadata (eg

                                                                                                        boundaries) for each swath

                                                                                                        tile

                                                                                                        Query this table to get the

                                                                                                        list of satellite scan times

                                                                                                        that cover a target tile

                                                                                                        Swath Source

                                                                                                        Data Storage

                                                                                                        bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                                        bull Storage costs driven by data scale and 6 month project duration

                                                                                                        bull Small with respect to the people costs even at graduate student rates

                                                                                                        Reduction 1 Queue

                                                                                                        Source Metadata

                                                                                                        Request Queue

                                                                                                        Scientific Results Download

                                                                                                        Data Collection Stage

                                                                                                        Source Imagery Download Sites

                                                                                                        Reprojection Queue

                                                                                                        Reduction 2 Queue

                                                                                                        Download

                                                                                                        Queue

                                                                                                        Scientists

                                                                                                        Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                        400-500 GB

                                                                                                        60K files

                                                                                                        10 MBsec

                                                                                                        11 hours

                                                                                                        lt10 workers

                                                                                                        $50 upload

                                                                                                        $450 storage

                                                                                                        400 GB

                                                                                                        45K files

                                                                                                        3500 hours

                                                                                                        20-100

                                                                                                        workers

                                                                                                        5-7 GB

                                                                                                        55K files

                                                                                                        1800 hours

                                                                                                        20-100

                                                                                                        workers

                                                                                                        lt10 GB

                                                                                                        ~1K files

                                                                                                        1800 hours

                                                                                                        20-100

                                                                                                        workers

                                                                                                        $420 cpu

                                                                                                        $60 download

                                                                                                        $216 cpu

                                                                                                        $1 download

                                                                                                        $6 storage

                                                                                                        $216 cpu

                                                                                                        $2 download

                                                                                                        $9 storage

                                                                                                        AzureMODIS

                                                                                                        Service Web Role Portal

                                                                                                        Total $1420

                                                                                                        bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                                        potential to be important to both large and small scale science problems

                                                                                                        bull Equally import they can increase participation in research providing needed

                                                                                                        resources to userscommunities without ready access

                                                                                                        bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                                        support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                                        latency applications do not perform optimally on clouds today

                                                                                                        bull Provide valuable fault tolerance and scalability abstractions

                                                                                                        bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                                        bull Clouds services to support research provide considerable leverage for both

                                                                                                        individual researchers and entire communities of researchers

                                                                                                        • Day 2 - Azure as a PaaS
                                                                                                        • Day 2 - Applications

                                                                                                          53

                                                                                                          Code ndash BlobHelpercs

                                                                                                          public string GetBlobText(string blobName) if (client == null || container == null) InitContainer() CloudBlob blob = containerGetBlobReference(blobName) try return blobDownloadText() catch (Exception) The blob probably does not exist or there is no connection available return null

                                                                                                          54

                                                                                                          Application Code - Blobs private void SaveToCloudButton_Click(object sender RoutedEventArgs e) StringBuilder buff = new StringBuilder() buffAppendLine(LastNameFirstNameEmailBirthdayNativeLanguageFavoriteIceCreamYearsInPhDGraduated) foreach (AttendeeEntity attendee in attendees) buffAppendLine(attendeeToCsvString()) blobHelperWriteStringToBlob(SummerSchoolAttendeestxt buffToString())

                                                                                                          The blob is now available at httpltAccountNamegtblobcorewindowsnetltContainerNamegtltBlobNamegt Or in this case httpjjstoreblobcorewindowsnetschoolSummerSchoolAttendeestxt

                                                                                                          55

                                                                                                          Code - TableEntities using MicrosoftWindowsAzureStorageClient public class AttendeeEntity TableServiceEntity public string FirstName get set public string LastName get set public string Email get set public DateTime Birthday get set public string FavoriteIceCream get set public int YearsInPhD get set public bool Graduated get set hellip

                                                                                                          56

                                                                                                          Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

                                                                                                          57

                                                                                                          Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

                                                                                                          58

                                                                                                          Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

                                                                                                          59

                                                                                                          Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

                                                                                                          60

                                                                                                          Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

                                                                                                          Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

                                                                                                          61

                                                                                                          Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

                                                                                                          62

                                                                                                          Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

                                                                                                          Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

                                                                                                          63

                                                                                                          Tools ndash Fiddler2

                                                                                                          Best Practices

                                                                                                          Picking the Right VM Size

                                                                                                          bull Having the correct VM size can make a big difference in costs

                                                                                                          bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                                                                                                          bull If you scale better than linear across cores larger VMs could save you money

                                                                                                          bull Pretty rare to see linear scaling across 8 cores

                                                                                                          bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                                                                                                          bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                                                                                                          Using Your VM to the Maximum

                                                                                                          Remember bull 1 role instance == 1 VM running Windows

                                                                                                          bull 1 role instance = one specific task for your code

                                                                                                          bull Yoursquore paying for the entire VM so why not use it

                                                                                                          bull Common mistake ndash split up code into multiple roles each not using up CPU

                                                                                                          bull Balance between using up CPU vs having free capacity in times of need

                                                                                                          bull Multiple ways to use your CPU to the fullest

                                                                                                          Exploiting Concurrency

                                                                                                          bull Spin up additional processes each with a specific task or as a unit of concurrency

                                                                                                          bull May not be ideal if number of active processes exceeds number of cores

                                                                                                          bull Use multithreading aggressively

                                                                                                          bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                                                                                                          bull In NET 4 use the Task Parallel Library

                                                                                                          bull Data parallelism

                                                                                                          bull Task parallelism

                                                                                                          Finding Good Code Neighbors

                                                                                                          bull Typically code falls into one or more of these categories

                                                                                                          bull Find code that is intensive with different resources to live together

                                                                                                          bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                                                                                                          Memory Intensive

                                                                                                          CPU Intensive

                                                                                                          Network IO Intensive

                                                                                                          Storage IO Intensive

                                                                                                          Scaling Appropriately

                                                                                                          bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                                                                                                          bull Spinning VMs up and down automatically is good at large scale

                                                                                                          bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                                                                                                          bull Being too aggressive in spinning down VMs can result in poor user experience

                                                                                                          bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                                                                                                          Performance Cost

                                                                                                          Storage Costs

                                                                                                          bull Understand an applicationrsquos storage profile and how storage billing works

                                                                                                          bull Make service choices based on your app profile

                                                                                                          bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                                                                                                          bull Service choice can make a big cost difference based on your app profile

                                                                                                          bull Caching and compressing They help a lot with storage costs

                                                                                                          Saving Bandwidth Costs

                                                                                                          Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                                                                                          Sending fewer things over the wire often means getting fewer things from storage

                                                                                                          Saving bandwidth costs often lead to savings in other places

                                                                                                          Sending fewer things means your VM has time to do other tasks

                                                                                                          All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                                                                                          Compressing Content

                                                                                                          1 Gzip all output content

                                                                                                          bull All modern browsers can decompress on the fly

                                                                                                          bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                                                                                          2Tradeoff compute costs for storage size

                                                                                                          3Minimize image sizes

                                                                                                          bull Use Portable Network Graphics (PNGs)

                                                                                                          bull Crush your PNGs

                                                                                                          bull Strip needless metadata

                                                                                                          bull Make all PNGs palette PNGs

                                                                                                          Uncompressed

                                                                                                          Content

                                                                                                          Compressed

                                                                                                          Content

                                                                                                          Gzip

                                                                                                          Minify JavaScript

                                                                                                          Minify CCS

                                                                                                          Minify Images

                                                                                                          Best Practices Summary

                                                                                                          Doing lsquolessrsquo is the key to saving costs

                                                                                                          Measure everything

                                                                                                          Know your application profile in and out

                                                                                                          Research Examples in the Cloud

                                                                                                          hellipon another set of slides

                                                                                                          Map Reduce on Azure

                                                                                                          bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                                                                                          bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                                                                                          76

                                                                                                          Project Daytona - Map Reduce on Azure

                                                                                                          httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                                                                                          77

                                                                                                          Questions and Discussionhellip

                                                                                                          Thank you for hosting me at the Summer School

                                                                                                          BLAST (Basic Local Alignment Search Tool)

                                                                                                          bull The most important software in bioinformatics

                                                                                                          bull Identify similarity between bio-sequences

                                                                                                          Computationally intensive

                                                                                                          bull Large number of pairwise alignment operations

                                                                                                          bull A BLAST running can take 700 ~ 1000 CPU hours

                                                                                                          bull Sequence databases growing exponentially

                                                                                                          bull GenBank doubled in size in about 15 months

                                                                                                          It is easy to parallelize BLAST

                                                                                                          bull Segment the input

                                                                                                          bull Segment processing (querying) is pleasingly parallel

                                                                                                          bull Segment the database (eg mpiBLAST)

                                                                                                          bull Needs special result reduction processing

                                                                                                          Large volume data

                                                                                                          bull A normal Blast database can be as large as 10GB

                                                                                                          bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                                                                          bull The output of BLAST is usually 10-100x larger than the input

                                                                                                          bull Parallel BLAST engine on Azure

                                                                                                          bull Query-segmentation data-parallel pattern bull split the input sequences

                                                                                                          bull query partitions in parallel

                                                                                                          bull merge results together when done

                                                                                                          bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                                                                          bull With three special considerations bull Batch job management

                                                                                                          bull Task parallelism on an elastic Cloud

                                                                                                          Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                                                                          on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                                                                          A simple SplitJoin pattern

                                                                                                          Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                                                                          bull 1248 for small middle large and extra large instance size

                                                                                                          Task granularity bull Large partition load imbalance

                                                                                                          bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                                                                          bull Data transferring overhead

                                                                                                          Best Practice test runs to profiling and set size to mitigate the overhead

                                                                                                          Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                                                                          bull too small repeated computation

                                                                                                          bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                                                                          bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                                                                          bull Watch out for the 2-hour maximum limitation

                                                                                                          BLAST task

                                                                                                          Splitting task

                                                                                                          BLAST task

                                                                                                          BLAST task

                                                                                                          BLAST task

                                                                                                          hellip

                                                                                                          Merging Task

                                                                                                          Task size vs Performance

                                                                                                          bull Benefit of the warm cache effect

                                                                                                          bull 100 sequences per partition is the best choice

                                                                                                          Instance size vs Performance

                                                                                                          bull Super-linear speedup with larger size worker instances

                                                                                                          bull Primarily due to the memory capability

                                                                                                          Task SizeInstance Size vs Cost

                                                                                                          bull Extra-large instance generated the best and the most economical throughput

                                                                                                          bull Fully utilize the resource

                                                                                                          Web

                                                                                                          Portal

                                                                                                          Web

                                                                                                          Service

                                                                                                          Job registration

                                                                                                          Job Scheduler

                                                                                                          Worker

                                                                                                          Worker

                                                                                                          Worker

                                                                                                          Global

                                                                                                          dispatch

                                                                                                          queue

                                                                                                          Web Role

                                                                                                          Azure Table

                                                                                                          Job Management Role

                                                                                                          Azure Blob

                                                                                                          Database

                                                                                                          updating Role

                                                                                                          hellip

                                                                                                          Scaling Engine

                                                                                                          Blast databases

                                                                                                          temporary data etc)

                                                                                                          Job Registry NCBI databases

                                                                                                          BLAST task

                                                                                                          Splitting task

                                                                                                          BLAST task

                                                                                                          BLAST task

                                                                                                          BLAST task

                                                                                                          hellip

                                                                                                          Merging Task

                                                                                                          ASPNET program hosted by a web role instance bull Submit jobs

                                                                                                          bull Track jobrsquos status and logs

                                                                                                          AuthenticationAuthorization based on Live ID

                                                                                                          The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                                                                          Web Portal

                                                                                                          Web Service

                                                                                                          Job registration

                                                                                                          Job Scheduler

                                                                                                          Job Portal

                                                                                                          Scaling Engine

                                                                                                          Job Registry

                                                                                                          R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                                                                          Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                                                                          bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                                                                          AzureBLAST significantly saved computing timehellip

                                                                                                          Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                                                                          ldquoAll against Allrdquo query bull The database is also the input query

                                                                                                          bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                                                                          bull Theoretically 100 billion sequence comparisons

                                                                                                          Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                                                                          bull Would require 3216731 minutes (61 years) on one desktop

                                                                                                          One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                                                                          bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                                                          bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                                                          bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                                                          bull Each segment consists of smaller partitions

                                                                                                          bull When load imbalances redistribute the load manually

                                                                                                          5

                                                                                                          0

                                                                                                          62 6

                                                                                                          2 6

                                                                                                          2 6

                                                                                                          2 6

                                                                                                          2 5

                                                                                                          0 62

                                                                                                          bull Total size of the output result is ~230GB

                                                                                                          bull The number of total hits is 1764579487

                                                                                                          bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                                                          bull But based our estimates real working instance time should be 6~8 day

                                                                                                          bull Look into log data to analyze what took placehellip

                                                                                                          5

                                                                                                          0

                                                                                                          62 6

                                                                                                          2 6

                                                                                                          2 6

                                                                                                          2 6

                                                                                                          2 5

                                                                                                          0 62

                                                                                                          A normal log record should be

                                                                                                          Otherwise something is wrong (eg task failed to complete)

                                                                                                          3312010 614 RD00155D3611B0 Executing the task 251523

                                                                                                          3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                                                          3312010 625 RD00155D3611B0 Executing the task 251553

                                                                                                          3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                                                          3312010 644 RD00155D3611B0 Executing the task 251600

                                                                                                          3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                                                          3312010 822 RD00155D3611B0 Executing the task 251774

                                                                                                          3312010 950 RD00155D3611B0 Executing the task 251895

                                                                                                          3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                                                          North Europe Data Center totally 34256 tasks processed

                                                                                                          All 62 compute nodes lost tasks

                                                                                                          and then came back in a group

                                                                                                          This is an Update domain

                                                                                                          ~30 mins

                                                                                                          ~ 6 nodes in one group

                                                                                                          35 Nodes experience blob

                                                                                                          writing failure at same

                                                                                                          time

                                                                                                          West Europe Datacenter 30976 tasks are completed and job was killed

                                                                                                          A reasonable guess the

                                                                                                          Fault Domain is working

                                                                                                          MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                                                          You never miss the water till the well has run dry Irish Proverb

                                                                                                          ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                                                          Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                                                          λv = Latent heat of vaporization (Jg)

                                                                                                          Rn = Net radiation (W m-2)

                                                                                                          cp = Specific heat capacity of air (J kg-1 K-1)

                                                                                                          ρa = dry air density (kg m-3)

                                                                                                          δq = vapor pressure deficit (Pa)

                                                                                                          ga = Conductivity of air (inverse of ra) (m s-1)

                                                                                                          gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                                                          γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                                                          Estimating resistanceconductivity across a

                                                                                                          catchment can be tricky

                                                                                                          bull Lots of inputs big data reduction

                                                                                                          bull Some of the inputs are not so simple

                                                                                                          119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                                                          (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                                                          Penman-Monteith (1964)

                                                                                                          Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                                                          NASA MODIS

                                                                                                          imagery source

                                                                                                          archives

                                                                                                          5 TB (600K files)

                                                                                                          FLUXNET curated

                                                                                                          sensor dataset

                                                                                                          (30GB 960 files)

                                                                                                          FLUXNET curated

                                                                                                          field dataset

                                                                                                          2 KB (1 file)

                                                                                                          NCEPNCAR

                                                                                                          ~100MB

                                                                                                          (4K files)

                                                                                                          Vegetative

                                                                                                          clumping

                                                                                                          ~5MB (1file)

                                                                                                          Climate

                                                                                                          classification

                                                                                                          ~1MB (1file)

                                                                                                          20 US year = 1 global year

                                                                                                          Data collection (map) stage

                                                                                                          bull Downloads requested input tiles

                                                                                                          from NASA ftp sites

                                                                                                          bull Includes geospatial lookup for

                                                                                                          non-sinusoidal tiles that will

                                                                                                          contribute to a reprojected

                                                                                                          sinusoidal tile

                                                                                                          Reprojection (map) stage

                                                                                                          bull Converts source tile(s) to

                                                                                                          intermediate result sinusoidal tiles

                                                                                                          bull Simple nearest neighbor or spline

                                                                                                          algorithms

                                                                                                          Derivation reduction stage

                                                                                                          bull First stage visible to scientist

                                                                                                          bull Computes ET in our initial use

                                                                                                          Analysis reduction stage

                                                                                                          bull Optional second stage visible to

                                                                                                          scientist

                                                                                                          bull Enables production of science

                                                                                                          analysis artifacts such as maps

                                                                                                          tables virtual sensors

                                                                                                          Reduction 1

                                                                                                          Queue

                                                                                                          Source

                                                                                                          Metadata

                                                                                                          AzureMODIS

                                                                                                          Service Web Role Portal

                                                                                                          Request

                                                                                                          Queue

                                                                                                          Scientific

                                                                                                          Results

                                                                                                          Download

                                                                                                          Data Collection Stage

                                                                                                          Source Imagery Download Sites

                                                                                                          Reprojection

                                                                                                          Queue

                                                                                                          Reduction 2

                                                                                                          Queue

                                                                                                          Download

                                                                                                          Queue

                                                                                                          Scientists

                                                                                                          Science results

                                                                                                          Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                          httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                                          bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                                          bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                                          bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                                          recoverable units of work

                                                                                                          bull Execution status of all jobs and tasks persisted in Tables

                                                                                                          ltPipelineStagegt

                                                                                                          Request

                                                                                                          hellip ltPipelineStagegtJobStatus

                                                                                                          Persist ltPipelineStagegtJob Queue

                                                                                                          MODISAzure Service

                                                                                                          (Web Role)

                                                                                                          Service Monitor

                                                                                                          (Worker Role)

                                                                                                          Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                          hellip

                                                                                                          Dispatch

                                                                                                          ltPipelineStagegtTask Queue

                                                                                                          All work actually done by a Worker Role

                                                                                                          bull Sandboxes science or other executable

                                                                                                          bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                                          Service Monitor

                                                                                                          (Worker Role)

                                                                                                          Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                          GenericWorker

                                                                                                          (Worker Role)

                                                                                                          hellip

                                                                                                          hellip

                                                                                                          Dispatch

                                                                                                          ltPipelineStagegtTask Queue

                                                                                                          hellip

                                                                                                          ltInputgtData Storage

                                                                                                          bull Dequeues tasks created by the Service Monitor

                                                                                                          bull Retries failed tasks 3 times

                                                                                                          bull Maintains all task status

                                                                                                          Reprojection Request

                                                                                                          hellip

                                                                                                          Service Monitor

                                                                                                          (Worker Role)

                                                                                                          ReprojectionJobStatus Persist

                                                                                                          Parse amp Persist ReprojectionTaskStatus

                                                                                                          GenericWorker

                                                                                                          (Worker Role)

                                                                                                          hellip

                                                                                                          Job Queue

                                                                                                          hellip

                                                                                                          Dispatch

                                                                                                          Task Queue

                                                                                                          Points to

                                                                                                          hellip

                                                                                                          ScanTimeList

                                                                                                          SwathGranuleMeta

                                                                                                          Reprojection Data

                                                                                                          Storage

                                                                                                          Each entity specifies a

                                                                                                          single reprojection job

                                                                                                          request

                                                                                                          Each entity specifies a

                                                                                                          single reprojection task (ie

                                                                                                          a single tile)

                                                                                                          Query this table to get

                                                                                                          geo-metadata (eg

                                                                                                          boundaries) for each swath

                                                                                                          tile

                                                                                                          Query this table to get the

                                                                                                          list of satellite scan times

                                                                                                          that cover a target tile

                                                                                                          Swath Source

                                                                                                          Data Storage

                                                                                                          bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                                          bull Storage costs driven by data scale and 6 month project duration

                                                                                                          bull Small with respect to the people costs even at graduate student rates

                                                                                                          Reduction 1 Queue

                                                                                                          Source Metadata

                                                                                                          Request Queue

                                                                                                          Scientific Results Download

                                                                                                          Data Collection Stage

                                                                                                          Source Imagery Download Sites

                                                                                                          Reprojection Queue

                                                                                                          Reduction 2 Queue

                                                                                                          Download

                                                                                                          Queue

                                                                                                          Scientists

                                                                                                          Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                          400-500 GB

                                                                                                          60K files

                                                                                                          10 MBsec

                                                                                                          11 hours

                                                                                                          lt10 workers

                                                                                                          $50 upload

                                                                                                          $450 storage

                                                                                                          400 GB

                                                                                                          45K files

                                                                                                          3500 hours

                                                                                                          20-100

                                                                                                          workers

                                                                                                          5-7 GB

                                                                                                          55K files

                                                                                                          1800 hours

                                                                                                          20-100

                                                                                                          workers

                                                                                                          lt10 GB

                                                                                                          ~1K files

                                                                                                          1800 hours

                                                                                                          20-100

                                                                                                          workers

                                                                                                          $420 cpu

                                                                                                          $60 download

                                                                                                          $216 cpu

                                                                                                          $1 download

                                                                                                          $6 storage

                                                                                                          $216 cpu

                                                                                                          $2 download

                                                                                                          $9 storage

                                                                                                          AzureMODIS

                                                                                                          Service Web Role Portal

                                                                                                          Total $1420

                                                                                                          bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                                          potential to be important to both large and small scale science problems

                                                                                                          bull Equally import they can increase participation in research providing needed

                                                                                                          resources to userscommunities without ready access

                                                                                                          bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                                          support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                                          latency applications do not perform optimally on clouds today

                                                                                                          bull Provide valuable fault tolerance and scalability abstractions

                                                                                                          bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                                          bull Clouds services to support research provide considerable leverage for both

                                                                                                          individual researchers and entire communities of researchers

                                                                                                          • Day 2 - Azure as a PaaS
                                                                                                          • Day 2 - Applications

                                                                                                            54

                                                                                                            Application Code - Blobs private void SaveToCloudButton_Click(object sender RoutedEventArgs e) StringBuilder buff = new StringBuilder() buffAppendLine(LastNameFirstNameEmailBirthdayNativeLanguageFavoriteIceCreamYearsInPhDGraduated) foreach (AttendeeEntity attendee in attendees) buffAppendLine(attendeeToCsvString()) blobHelperWriteStringToBlob(SummerSchoolAttendeestxt buffToString())

                                                                                                            The blob is now available at httpltAccountNamegtblobcorewindowsnetltContainerNamegtltBlobNamegt Or in this case httpjjstoreblobcorewindowsnetschoolSummerSchoolAttendeestxt

                                                                                                            55

                                                                                                            Code - TableEntities using MicrosoftWindowsAzureStorageClient public class AttendeeEntity TableServiceEntity public string FirstName get set public string LastName get set public string Email get set public DateTime Birthday get set public string FavoriteIceCream get set public int YearsInPhD get set public bool Graduated get set hellip

                                                                                                            56

                                                                                                            Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

                                                                                                            57

                                                                                                            Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

                                                                                                            58

                                                                                                            Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

                                                                                                            59

                                                                                                            Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

                                                                                                            60

                                                                                                            Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

                                                                                                            Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

                                                                                                            61

                                                                                                            Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

                                                                                                            62

                                                                                                            Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

                                                                                                            Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

                                                                                                            63

                                                                                                            Tools ndash Fiddler2

                                                                                                            Best Practices

                                                                                                            Picking the Right VM Size

                                                                                                            bull Having the correct VM size can make a big difference in costs

                                                                                                            bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                                                                                                            bull If you scale better than linear across cores larger VMs could save you money

                                                                                                            bull Pretty rare to see linear scaling across 8 cores

                                                                                                            bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                                                                                                            bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                                                                                                            Using Your VM to the Maximum

                                                                                                            Remember bull 1 role instance == 1 VM running Windows

                                                                                                            bull 1 role instance = one specific task for your code

                                                                                                            bull Yoursquore paying for the entire VM so why not use it

                                                                                                            bull Common mistake ndash split up code into multiple roles each not using up CPU

                                                                                                            bull Balance between using up CPU vs having free capacity in times of need

                                                                                                            bull Multiple ways to use your CPU to the fullest

                                                                                                            Exploiting Concurrency

                                                                                                            bull Spin up additional processes each with a specific task or as a unit of concurrency

                                                                                                            bull May not be ideal if number of active processes exceeds number of cores

                                                                                                            bull Use multithreading aggressively

                                                                                                            bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                                                                                                            bull In NET 4 use the Task Parallel Library

                                                                                                            bull Data parallelism

                                                                                                            bull Task parallelism

                                                                                                            Finding Good Code Neighbors

                                                                                                            bull Typically code falls into one or more of these categories

                                                                                                            bull Find code that is intensive with different resources to live together

                                                                                                            bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                                                                                                            Memory Intensive

                                                                                                            CPU Intensive

                                                                                                            Network IO Intensive

                                                                                                            Storage IO Intensive

                                                                                                            Scaling Appropriately

                                                                                                            bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                                                                                                            bull Spinning VMs up and down automatically is good at large scale

                                                                                                            bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                                                                                                            bull Being too aggressive in spinning down VMs can result in poor user experience

                                                                                                            bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                                                                                                            Performance Cost

                                                                                                            Storage Costs

                                                                                                            bull Understand an applicationrsquos storage profile and how storage billing works

                                                                                                            bull Make service choices based on your app profile

                                                                                                            bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                                                                                                            bull Service choice can make a big cost difference based on your app profile

                                                                                                            bull Caching and compressing They help a lot with storage costs

                                                                                                            Saving Bandwidth Costs

                                                                                                            Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                                                                                            Sending fewer things over the wire often means getting fewer things from storage

                                                                                                            Saving bandwidth costs often lead to savings in other places

                                                                                                            Sending fewer things means your VM has time to do other tasks

                                                                                                            All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                                                                                            Compressing Content

                                                                                                            1 Gzip all output content

                                                                                                            bull All modern browsers can decompress on the fly

                                                                                                            bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                                                                                            2Tradeoff compute costs for storage size

                                                                                                            3Minimize image sizes

                                                                                                            bull Use Portable Network Graphics (PNGs)

                                                                                                            bull Crush your PNGs

                                                                                                            bull Strip needless metadata

                                                                                                            bull Make all PNGs palette PNGs

                                                                                                            Uncompressed

                                                                                                            Content

                                                                                                            Compressed

                                                                                                            Content

                                                                                                            Gzip

                                                                                                            Minify JavaScript

                                                                                                            Minify CCS

                                                                                                            Minify Images

                                                                                                            Best Practices Summary

                                                                                                            Doing lsquolessrsquo is the key to saving costs

                                                                                                            Measure everything

                                                                                                            Know your application profile in and out

                                                                                                            Research Examples in the Cloud

                                                                                                            hellipon another set of slides

                                                                                                            Map Reduce on Azure

                                                                                                            bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                                                                                            bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                                                                                            76

                                                                                                            Project Daytona - Map Reduce on Azure

                                                                                                            httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                                                                                            77

                                                                                                            Questions and Discussionhellip

                                                                                                            Thank you for hosting me at the Summer School

                                                                                                            BLAST (Basic Local Alignment Search Tool)

                                                                                                            bull The most important software in bioinformatics

                                                                                                            bull Identify similarity between bio-sequences

                                                                                                            Computationally intensive

                                                                                                            bull Large number of pairwise alignment operations

                                                                                                            bull A BLAST running can take 700 ~ 1000 CPU hours

                                                                                                            bull Sequence databases growing exponentially

                                                                                                            bull GenBank doubled in size in about 15 months

                                                                                                            It is easy to parallelize BLAST

                                                                                                            bull Segment the input

                                                                                                            bull Segment processing (querying) is pleasingly parallel

                                                                                                            bull Segment the database (eg mpiBLAST)

                                                                                                            bull Needs special result reduction processing

                                                                                                            Large volume data

                                                                                                            bull A normal Blast database can be as large as 10GB

                                                                                                            bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                                                                            bull The output of BLAST is usually 10-100x larger than the input

                                                                                                            bull Parallel BLAST engine on Azure

                                                                                                            bull Query-segmentation data-parallel pattern bull split the input sequences

                                                                                                            bull query partitions in parallel

                                                                                                            bull merge results together when done

                                                                                                            bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                                                                            bull With three special considerations bull Batch job management

                                                                                                            bull Task parallelism on an elastic Cloud

                                                                                                            Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                                                                            on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                                                                            A simple SplitJoin pattern

                                                                                                            Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                                                                            bull 1248 for small middle large and extra large instance size

                                                                                                            Task granularity bull Large partition load imbalance

                                                                                                            bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                                                                            bull Data transferring overhead

                                                                                                            Best Practice test runs to profiling and set size to mitigate the overhead

                                                                                                            Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                                                                            bull too small repeated computation

                                                                                                            bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                                                                            bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                                                                            bull Watch out for the 2-hour maximum limitation

                                                                                                            BLAST task

                                                                                                            Splitting task

                                                                                                            BLAST task

                                                                                                            BLAST task

                                                                                                            BLAST task

                                                                                                            hellip

                                                                                                            Merging Task

                                                                                                            Task size vs Performance

                                                                                                            bull Benefit of the warm cache effect

                                                                                                            bull 100 sequences per partition is the best choice

                                                                                                            Instance size vs Performance

                                                                                                            bull Super-linear speedup with larger size worker instances

                                                                                                            bull Primarily due to the memory capability

                                                                                                            Task SizeInstance Size vs Cost

                                                                                                            bull Extra-large instance generated the best and the most economical throughput

                                                                                                            bull Fully utilize the resource

                                                                                                            Web

                                                                                                            Portal

                                                                                                            Web

                                                                                                            Service

                                                                                                            Job registration

                                                                                                            Job Scheduler

                                                                                                            Worker

                                                                                                            Worker

                                                                                                            Worker

                                                                                                            Global

                                                                                                            dispatch

                                                                                                            queue

                                                                                                            Web Role

                                                                                                            Azure Table

                                                                                                            Job Management Role

                                                                                                            Azure Blob

                                                                                                            Database

                                                                                                            updating Role

                                                                                                            hellip

                                                                                                            Scaling Engine

                                                                                                            Blast databases

                                                                                                            temporary data etc)

                                                                                                            Job Registry NCBI databases

                                                                                                            BLAST task

                                                                                                            Splitting task

                                                                                                            BLAST task

                                                                                                            BLAST task

                                                                                                            BLAST task

                                                                                                            hellip

                                                                                                            Merging Task

                                                                                                            ASPNET program hosted by a web role instance bull Submit jobs

                                                                                                            bull Track jobrsquos status and logs

                                                                                                            AuthenticationAuthorization based on Live ID

                                                                                                            The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                                                                            Web Portal

                                                                                                            Web Service

                                                                                                            Job registration

                                                                                                            Job Scheduler

                                                                                                            Job Portal

                                                                                                            Scaling Engine

                                                                                                            Job Registry

                                                                                                            R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                                                                            Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                                                                            bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                                                                            AzureBLAST significantly saved computing timehellip

                                                                                                            Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                                                                            ldquoAll against Allrdquo query bull The database is also the input query

                                                                                                            bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                                                                            bull Theoretically 100 billion sequence comparisons

                                                                                                            Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                                                                            bull Would require 3216731 minutes (61 years) on one desktop

                                                                                                            One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                                                                            bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                                                            bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                                                            bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                                                            bull Each segment consists of smaller partitions

                                                                                                            bull When load imbalances redistribute the load manually

                                                                                                            5

                                                                                                            0

                                                                                                            62 6

                                                                                                            2 6

                                                                                                            2 6

                                                                                                            2 6

                                                                                                            2 5

                                                                                                            0 62

                                                                                                            bull Total size of the output result is ~230GB

                                                                                                            bull The number of total hits is 1764579487

                                                                                                            bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                                                            bull But based our estimates real working instance time should be 6~8 day

                                                                                                            bull Look into log data to analyze what took placehellip

                                                                                                            5

                                                                                                            0

                                                                                                            62 6

                                                                                                            2 6

                                                                                                            2 6

                                                                                                            2 6

                                                                                                            2 5

                                                                                                            0 62

                                                                                                            A normal log record should be

                                                                                                            Otherwise something is wrong (eg task failed to complete)

                                                                                                            3312010 614 RD00155D3611B0 Executing the task 251523

                                                                                                            3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                                                            3312010 625 RD00155D3611B0 Executing the task 251553

                                                                                                            3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                                                            3312010 644 RD00155D3611B0 Executing the task 251600

                                                                                                            3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                                                            3312010 822 RD00155D3611B0 Executing the task 251774

                                                                                                            3312010 950 RD00155D3611B0 Executing the task 251895

                                                                                                            3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                                                            North Europe Data Center totally 34256 tasks processed

                                                                                                            All 62 compute nodes lost tasks

                                                                                                            and then came back in a group

                                                                                                            This is an Update domain

                                                                                                            ~30 mins

                                                                                                            ~ 6 nodes in one group

                                                                                                            35 Nodes experience blob

                                                                                                            writing failure at same

                                                                                                            time

                                                                                                            West Europe Datacenter 30976 tasks are completed and job was killed

                                                                                                            A reasonable guess the

                                                                                                            Fault Domain is working

                                                                                                            MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                                                            You never miss the water till the well has run dry Irish Proverb

                                                                                                            ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                                                            Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                                                            λv = Latent heat of vaporization (Jg)

                                                                                                            Rn = Net radiation (W m-2)

                                                                                                            cp = Specific heat capacity of air (J kg-1 K-1)

                                                                                                            ρa = dry air density (kg m-3)

                                                                                                            δq = vapor pressure deficit (Pa)

                                                                                                            ga = Conductivity of air (inverse of ra) (m s-1)

                                                                                                            gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                                                            γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                                                            Estimating resistanceconductivity across a

                                                                                                            catchment can be tricky

                                                                                                            bull Lots of inputs big data reduction

                                                                                                            bull Some of the inputs are not so simple

                                                                                                            119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                                                            (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                                                            Penman-Monteith (1964)

                                                                                                            Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                                                            NASA MODIS

                                                                                                            imagery source

                                                                                                            archives

                                                                                                            5 TB (600K files)

                                                                                                            FLUXNET curated

                                                                                                            sensor dataset

                                                                                                            (30GB 960 files)

                                                                                                            FLUXNET curated

                                                                                                            field dataset

                                                                                                            2 KB (1 file)

                                                                                                            NCEPNCAR

                                                                                                            ~100MB

                                                                                                            (4K files)

                                                                                                            Vegetative

                                                                                                            clumping

                                                                                                            ~5MB (1file)

                                                                                                            Climate

                                                                                                            classification

                                                                                                            ~1MB (1file)

                                                                                                            20 US year = 1 global year

                                                                                                            Data collection (map) stage

                                                                                                            bull Downloads requested input tiles

                                                                                                            from NASA ftp sites

                                                                                                            bull Includes geospatial lookup for

                                                                                                            non-sinusoidal tiles that will

                                                                                                            contribute to a reprojected

                                                                                                            sinusoidal tile

                                                                                                            Reprojection (map) stage

                                                                                                            bull Converts source tile(s) to

                                                                                                            intermediate result sinusoidal tiles

                                                                                                            bull Simple nearest neighbor or spline

                                                                                                            algorithms

                                                                                                            Derivation reduction stage

                                                                                                            bull First stage visible to scientist

                                                                                                            bull Computes ET in our initial use

                                                                                                            Analysis reduction stage

                                                                                                            bull Optional second stage visible to

                                                                                                            scientist

                                                                                                            bull Enables production of science

                                                                                                            analysis artifacts such as maps

                                                                                                            tables virtual sensors

                                                                                                            Reduction 1

                                                                                                            Queue

                                                                                                            Source

                                                                                                            Metadata

                                                                                                            AzureMODIS

                                                                                                            Service Web Role Portal

                                                                                                            Request

                                                                                                            Queue

                                                                                                            Scientific

                                                                                                            Results

                                                                                                            Download

                                                                                                            Data Collection Stage

                                                                                                            Source Imagery Download Sites

                                                                                                            Reprojection

                                                                                                            Queue

                                                                                                            Reduction 2

                                                                                                            Queue

                                                                                                            Download

                                                                                                            Queue

                                                                                                            Scientists

                                                                                                            Science results

                                                                                                            Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                            httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                                            bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                                            bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                                            bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                                            recoverable units of work

                                                                                                            bull Execution status of all jobs and tasks persisted in Tables

                                                                                                            ltPipelineStagegt

                                                                                                            Request

                                                                                                            hellip ltPipelineStagegtJobStatus

                                                                                                            Persist ltPipelineStagegtJob Queue

                                                                                                            MODISAzure Service

                                                                                                            (Web Role)

                                                                                                            Service Monitor

                                                                                                            (Worker Role)

                                                                                                            Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                            hellip

                                                                                                            Dispatch

                                                                                                            ltPipelineStagegtTask Queue

                                                                                                            All work actually done by a Worker Role

                                                                                                            bull Sandboxes science or other executable

                                                                                                            bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                                            Service Monitor

                                                                                                            (Worker Role)

                                                                                                            Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                            GenericWorker

                                                                                                            (Worker Role)

                                                                                                            hellip

                                                                                                            hellip

                                                                                                            Dispatch

                                                                                                            ltPipelineStagegtTask Queue

                                                                                                            hellip

                                                                                                            ltInputgtData Storage

                                                                                                            bull Dequeues tasks created by the Service Monitor

                                                                                                            bull Retries failed tasks 3 times

                                                                                                            bull Maintains all task status

                                                                                                            Reprojection Request

                                                                                                            hellip

                                                                                                            Service Monitor

                                                                                                            (Worker Role)

                                                                                                            ReprojectionJobStatus Persist

                                                                                                            Parse amp Persist ReprojectionTaskStatus

                                                                                                            GenericWorker

                                                                                                            (Worker Role)

                                                                                                            hellip

                                                                                                            Job Queue

                                                                                                            hellip

                                                                                                            Dispatch

                                                                                                            Task Queue

                                                                                                            Points to

                                                                                                            hellip

                                                                                                            ScanTimeList

                                                                                                            SwathGranuleMeta

                                                                                                            Reprojection Data

                                                                                                            Storage

                                                                                                            Each entity specifies a

                                                                                                            single reprojection job

                                                                                                            request

                                                                                                            Each entity specifies a

                                                                                                            single reprojection task (ie

                                                                                                            a single tile)

                                                                                                            Query this table to get

                                                                                                            geo-metadata (eg

                                                                                                            boundaries) for each swath

                                                                                                            tile

                                                                                                            Query this table to get the

                                                                                                            list of satellite scan times

                                                                                                            that cover a target tile

                                                                                                            Swath Source

                                                                                                            Data Storage

                                                                                                            bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                                            bull Storage costs driven by data scale and 6 month project duration

                                                                                                            bull Small with respect to the people costs even at graduate student rates

                                                                                                            Reduction 1 Queue

                                                                                                            Source Metadata

                                                                                                            Request Queue

                                                                                                            Scientific Results Download

                                                                                                            Data Collection Stage

                                                                                                            Source Imagery Download Sites

                                                                                                            Reprojection Queue

                                                                                                            Reduction 2 Queue

                                                                                                            Download

                                                                                                            Queue

                                                                                                            Scientists

                                                                                                            Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                            400-500 GB

                                                                                                            60K files

                                                                                                            10 MBsec

                                                                                                            11 hours

                                                                                                            lt10 workers

                                                                                                            $50 upload

                                                                                                            $450 storage

                                                                                                            400 GB

                                                                                                            45K files

                                                                                                            3500 hours

                                                                                                            20-100

                                                                                                            workers

                                                                                                            5-7 GB

                                                                                                            55K files

                                                                                                            1800 hours

                                                                                                            20-100

                                                                                                            workers

                                                                                                            lt10 GB

                                                                                                            ~1K files

                                                                                                            1800 hours

                                                                                                            20-100

                                                                                                            workers

                                                                                                            $420 cpu

                                                                                                            $60 download

                                                                                                            $216 cpu

                                                                                                            $1 download

                                                                                                            $6 storage

                                                                                                            $216 cpu

                                                                                                            $2 download

                                                                                                            $9 storage

                                                                                                            AzureMODIS

                                                                                                            Service Web Role Portal

                                                                                                            Total $1420

                                                                                                            bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                                            potential to be important to both large and small scale science problems

                                                                                                            bull Equally import they can increase participation in research providing needed

                                                                                                            resources to userscommunities without ready access

                                                                                                            bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                                            support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                                            latency applications do not perform optimally on clouds today

                                                                                                            bull Provide valuable fault tolerance and scalability abstractions

                                                                                                            bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                                            bull Clouds services to support research provide considerable leverage for both

                                                                                                            individual researchers and entire communities of researchers

                                                                                                            • Day 2 - Azure as a PaaS
                                                                                                            • Day 2 - Applications

                                                                                                              55

                                                                                                              Code - TableEntities using MicrosoftWindowsAzureStorageClient public class AttendeeEntity TableServiceEntity public string FirstName get set public string LastName get set public string Email get set public DateTime Birthday get set public string FavoriteIceCream get set public int YearsInPhD get set public bool Graduated get set hellip

                                                                                                              56

                                                                                                              Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

                                                                                                              57

                                                                                                              Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

                                                                                                              58

                                                                                                              Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

                                                                                                              59

                                                                                                              Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

                                                                                                              60

                                                                                                              Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

                                                                                                              Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

                                                                                                              61

                                                                                                              Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

                                                                                                              62

                                                                                                              Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

                                                                                                              Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

                                                                                                              63

                                                                                                              Tools ndash Fiddler2

                                                                                                              Best Practices

                                                                                                              Picking the Right VM Size

                                                                                                              bull Having the correct VM size can make a big difference in costs

                                                                                                              bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                                                                                                              bull If you scale better than linear across cores larger VMs could save you money

                                                                                                              bull Pretty rare to see linear scaling across 8 cores

                                                                                                              bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                                                                                                              bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                                                                                                              Using Your VM to the Maximum

                                                                                                              Remember bull 1 role instance == 1 VM running Windows

                                                                                                              bull 1 role instance = one specific task for your code

                                                                                                              bull Yoursquore paying for the entire VM so why not use it

                                                                                                              bull Common mistake ndash split up code into multiple roles each not using up CPU

                                                                                                              bull Balance between using up CPU vs having free capacity in times of need

                                                                                                              bull Multiple ways to use your CPU to the fullest

                                                                                                              Exploiting Concurrency

                                                                                                              bull Spin up additional processes each with a specific task or as a unit of concurrency

                                                                                                              bull May not be ideal if number of active processes exceeds number of cores

                                                                                                              bull Use multithreading aggressively

                                                                                                              bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                                                                                                              bull In NET 4 use the Task Parallel Library

                                                                                                              bull Data parallelism

                                                                                                              bull Task parallelism

                                                                                                              Finding Good Code Neighbors

                                                                                                              bull Typically code falls into one or more of these categories

                                                                                                              bull Find code that is intensive with different resources to live together

                                                                                                              bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                                                                                                              Memory Intensive

                                                                                                              CPU Intensive

                                                                                                              Network IO Intensive

                                                                                                              Storage IO Intensive

                                                                                                              Scaling Appropriately

                                                                                                              bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                                                                                                              bull Spinning VMs up and down automatically is good at large scale

                                                                                                              bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                                                                                                              bull Being too aggressive in spinning down VMs can result in poor user experience

                                                                                                              bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                                                                                                              Performance Cost

                                                                                                              Storage Costs

                                                                                                              bull Understand an applicationrsquos storage profile and how storage billing works

                                                                                                              bull Make service choices based on your app profile

                                                                                                              bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                                                                                                              bull Service choice can make a big cost difference based on your app profile

                                                                                                              bull Caching and compressing They help a lot with storage costs

                                                                                                              Saving Bandwidth Costs

                                                                                                              Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                                                                                              Sending fewer things over the wire often means getting fewer things from storage

                                                                                                              Saving bandwidth costs often lead to savings in other places

                                                                                                              Sending fewer things means your VM has time to do other tasks

                                                                                                              All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                                                                                              Compressing Content

                                                                                                              1 Gzip all output content

                                                                                                              bull All modern browsers can decompress on the fly

                                                                                                              bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                                                                                              2Tradeoff compute costs for storage size

                                                                                                              3Minimize image sizes

                                                                                                              bull Use Portable Network Graphics (PNGs)

                                                                                                              bull Crush your PNGs

                                                                                                              bull Strip needless metadata

                                                                                                              bull Make all PNGs palette PNGs

                                                                                                              Uncompressed

                                                                                                              Content

                                                                                                              Compressed

                                                                                                              Content

                                                                                                              Gzip

                                                                                                              Minify JavaScript

                                                                                                              Minify CCS

                                                                                                              Minify Images

                                                                                                              Best Practices Summary

                                                                                                              Doing lsquolessrsquo is the key to saving costs

                                                                                                              Measure everything

                                                                                                              Know your application profile in and out

                                                                                                              Research Examples in the Cloud

                                                                                                              hellipon another set of slides

                                                                                                              Map Reduce on Azure

                                                                                                              bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                                                                                              bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                                                                                              76

                                                                                                              Project Daytona - Map Reduce on Azure

                                                                                                              httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                                                                                              77

                                                                                                              Questions and Discussionhellip

                                                                                                              Thank you for hosting me at the Summer School

                                                                                                              BLAST (Basic Local Alignment Search Tool)

                                                                                                              bull The most important software in bioinformatics

                                                                                                              bull Identify similarity between bio-sequences

                                                                                                              Computationally intensive

                                                                                                              bull Large number of pairwise alignment operations

                                                                                                              bull A BLAST running can take 700 ~ 1000 CPU hours

                                                                                                              bull Sequence databases growing exponentially

                                                                                                              bull GenBank doubled in size in about 15 months

                                                                                                              It is easy to parallelize BLAST

                                                                                                              bull Segment the input

                                                                                                              bull Segment processing (querying) is pleasingly parallel

                                                                                                              bull Segment the database (eg mpiBLAST)

                                                                                                              bull Needs special result reduction processing

                                                                                                              Large volume data

                                                                                                              bull A normal Blast database can be as large as 10GB

                                                                                                              bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                                                                              bull The output of BLAST is usually 10-100x larger than the input

                                                                                                              bull Parallel BLAST engine on Azure

                                                                                                              bull Query-segmentation data-parallel pattern bull split the input sequences

                                                                                                              bull query partitions in parallel

                                                                                                              bull merge results together when done

                                                                                                              bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                                                                              bull With three special considerations bull Batch job management

                                                                                                              bull Task parallelism on an elastic Cloud

                                                                                                              Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                                                                              on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                                                                              A simple SplitJoin pattern

                                                                                                              Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                                                                              bull 1248 for small middle large and extra large instance size

                                                                                                              Task granularity bull Large partition load imbalance

                                                                                                              bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                                                                              bull Data transferring overhead

                                                                                                              Best Practice test runs to profiling and set size to mitigate the overhead

                                                                                                              Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                                                                              bull too small repeated computation

                                                                                                              bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                                                                              bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                                                                              bull Watch out for the 2-hour maximum limitation

                                                                                                              BLAST task

                                                                                                              Splitting task

                                                                                                              BLAST task

                                                                                                              BLAST task

                                                                                                              BLAST task

                                                                                                              hellip

                                                                                                              Merging Task

                                                                                                              Task size vs Performance

                                                                                                              bull Benefit of the warm cache effect

                                                                                                              bull 100 sequences per partition is the best choice

                                                                                                              Instance size vs Performance

                                                                                                              bull Super-linear speedup with larger size worker instances

                                                                                                              bull Primarily due to the memory capability

                                                                                                              Task SizeInstance Size vs Cost

                                                                                                              bull Extra-large instance generated the best and the most economical throughput

                                                                                                              bull Fully utilize the resource

                                                                                                              Web

                                                                                                              Portal

                                                                                                              Web

                                                                                                              Service

                                                                                                              Job registration

                                                                                                              Job Scheduler

                                                                                                              Worker

                                                                                                              Worker

                                                                                                              Worker

                                                                                                              Global

                                                                                                              dispatch

                                                                                                              queue

                                                                                                              Web Role

                                                                                                              Azure Table

                                                                                                              Job Management Role

                                                                                                              Azure Blob

                                                                                                              Database

                                                                                                              updating Role

                                                                                                              hellip

                                                                                                              Scaling Engine

                                                                                                              Blast databases

                                                                                                              temporary data etc)

                                                                                                              Job Registry NCBI databases

                                                                                                              BLAST task

                                                                                                              Splitting task

                                                                                                              BLAST task

                                                                                                              BLAST task

                                                                                                              BLAST task

                                                                                                              hellip

                                                                                                              Merging Task

                                                                                                              ASPNET program hosted by a web role instance bull Submit jobs

                                                                                                              bull Track jobrsquos status and logs

                                                                                                              AuthenticationAuthorization based on Live ID

                                                                                                              The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                                                                              Web Portal

                                                                                                              Web Service

                                                                                                              Job registration

                                                                                                              Job Scheduler

                                                                                                              Job Portal

                                                                                                              Scaling Engine

                                                                                                              Job Registry

                                                                                                              R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                                                                              Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                                                                              bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                                                                              AzureBLAST significantly saved computing timehellip

                                                                                                              Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                                                                              ldquoAll against Allrdquo query bull The database is also the input query

                                                                                                              bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                                                                              bull Theoretically 100 billion sequence comparisons

                                                                                                              Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                                                                              bull Would require 3216731 minutes (61 years) on one desktop

                                                                                                              One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                                                                              bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                                                              bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                                                              bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                                                              bull Each segment consists of smaller partitions

                                                                                                              bull When load imbalances redistribute the load manually

                                                                                                              5

                                                                                                              0

                                                                                                              62 6

                                                                                                              2 6

                                                                                                              2 6

                                                                                                              2 6

                                                                                                              2 5

                                                                                                              0 62

                                                                                                              bull Total size of the output result is ~230GB

                                                                                                              bull The number of total hits is 1764579487

                                                                                                              bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                                                              bull But based our estimates real working instance time should be 6~8 day

                                                                                                              bull Look into log data to analyze what took placehellip

                                                                                                              5

                                                                                                              0

                                                                                                              62 6

                                                                                                              2 6

                                                                                                              2 6

                                                                                                              2 6

                                                                                                              2 5

                                                                                                              0 62

                                                                                                              A normal log record should be

                                                                                                              Otherwise something is wrong (eg task failed to complete)

                                                                                                              3312010 614 RD00155D3611B0 Executing the task 251523

                                                                                                              3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                                                              3312010 625 RD00155D3611B0 Executing the task 251553

                                                                                                              3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                                                              3312010 644 RD00155D3611B0 Executing the task 251600

                                                                                                              3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                                                              3312010 822 RD00155D3611B0 Executing the task 251774

                                                                                                              3312010 950 RD00155D3611B0 Executing the task 251895

                                                                                                              3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                                                              North Europe Data Center totally 34256 tasks processed

                                                                                                              All 62 compute nodes lost tasks

                                                                                                              and then came back in a group

                                                                                                              This is an Update domain

                                                                                                              ~30 mins

                                                                                                              ~ 6 nodes in one group

                                                                                                              35 Nodes experience blob

                                                                                                              writing failure at same

                                                                                                              time

                                                                                                              West Europe Datacenter 30976 tasks are completed and job was killed

                                                                                                              A reasonable guess the

                                                                                                              Fault Domain is working

                                                                                                              MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                                                              You never miss the water till the well has run dry Irish Proverb

                                                                                                              ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                                                              Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                                                              λv = Latent heat of vaporization (Jg)

                                                                                                              Rn = Net radiation (W m-2)

                                                                                                              cp = Specific heat capacity of air (J kg-1 K-1)

                                                                                                              ρa = dry air density (kg m-3)

                                                                                                              δq = vapor pressure deficit (Pa)

                                                                                                              ga = Conductivity of air (inverse of ra) (m s-1)

                                                                                                              gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                                                              γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                                                              Estimating resistanceconductivity across a

                                                                                                              catchment can be tricky

                                                                                                              bull Lots of inputs big data reduction

                                                                                                              bull Some of the inputs are not so simple

                                                                                                              119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                                                              (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                                                              Penman-Monteith (1964)

                                                                                                              Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                                                              NASA MODIS

                                                                                                              imagery source

                                                                                                              archives

                                                                                                              5 TB (600K files)

                                                                                                              FLUXNET curated

                                                                                                              sensor dataset

                                                                                                              (30GB 960 files)

                                                                                                              FLUXNET curated

                                                                                                              field dataset

                                                                                                              2 KB (1 file)

                                                                                                              NCEPNCAR

                                                                                                              ~100MB

                                                                                                              (4K files)

                                                                                                              Vegetative

                                                                                                              clumping

                                                                                                              ~5MB (1file)

                                                                                                              Climate

                                                                                                              classification

                                                                                                              ~1MB (1file)

                                                                                                              20 US year = 1 global year

                                                                                                              Data collection (map) stage

                                                                                                              bull Downloads requested input tiles

                                                                                                              from NASA ftp sites

                                                                                                              bull Includes geospatial lookup for

                                                                                                              non-sinusoidal tiles that will

                                                                                                              contribute to a reprojected

                                                                                                              sinusoidal tile

                                                                                                              Reprojection (map) stage

                                                                                                              bull Converts source tile(s) to

                                                                                                              intermediate result sinusoidal tiles

                                                                                                              bull Simple nearest neighbor or spline

                                                                                                              algorithms

                                                                                                              Derivation reduction stage

                                                                                                              bull First stage visible to scientist

                                                                                                              bull Computes ET in our initial use

                                                                                                              Analysis reduction stage

                                                                                                              bull Optional second stage visible to

                                                                                                              scientist

                                                                                                              bull Enables production of science

                                                                                                              analysis artifacts such as maps

                                                                                                              tables virtual sensors

                                                                                                              Reduction 1

                                                                                                              Queue

                                                                                                              Source

                                                                                                              Metadata

                                                                                                              AzureMODIS

                                                                                                              Service Web Role Portal

                                                                                                              Request

                                                                                                              Queue

                                                                                                              Scientific

                                                                                                              Results

                                                                                                              Download

                                                                                                              Data Collection Stage

                                                                                                              Source Imagery Download Sites

                                                                                                              Reprojection

                                                                                                              Queue

                                                                                                              Reduction 2

                                                                                                              Queue

                                                                                                              Download

                                                                                                              Queue

                                                                                                              Scientists

                                                                                                              Science results

                                                                                                              Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                              httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                                              bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                                              bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                                              bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                                              recoverable units of work

                                                                                                              bull Execution status of all jobs and tasks persisted in Tables

                                                                                                              ltPipelineStagegt

                                                                                                              Request

                                                                                                              hellip ltPipelineStagegtJobStatus

                                                                                                              Persist ltPipelineStagegtJob Queue

                                                                                                              MODISAzure Service

                                                                                                              (Web Role)

                                                                                                              Service Monitor

                                                                                                              (Worker Role)

                                                                                                              Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                              hellip

                                                                                                              Dispatch

                                                                                                              ltPipelineStagegtTask Queue

                                                                                                              All work actually done by a Worker Role

                                                                                                              bull Sandboxes science or other executable

                                                                                                              bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                                              Service Monitor

                                                                                                              (Worker Role)

                                                                                                              Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                              GenericWorker

                                                                                                              (Worker Role)

                                                                                                              hellip

                                                                                                              hellip

                                                                                                              Dispatch

                                                                                                              ltPipelineStagegtTask Queue

                                                                                                              hellip

                                                                                                              ltInputgtData Storage

                                                                                                              bull Dequeues tasks created by the Service Monitor

                                                                                                              bull Retries failed tasks 3 times

                                                                                                              bull Maintains all task status

                                                                                                              Reprojection Request

                                                                                                              hellip

                                                                                                              Service Monitor

                                                                                                              (Worker Role)

                                                                                                              ReprojectionJobStatus Persist

                                                                                                              Parse amp Persist ReprojectionTaskStatus

                                                                                                              GenericWorker

                                                                                                              (Worker Role)

                                                                                                              hellip

                                                                                                              Job Queue

                                                                                                              hellip

                                                                                                              Dispatch

                                                                                                              Task Queue

                                                                                                              Points to

                                                                                                              hellip

                                                                                                              ScanTimeList

                                                                                                              SwathGranuleMeta

                                                                                                              Reprojection Data

                                                                                                              Storage

                                                                                                              Each entity specifies a

                                                                                                              single reprojection job

                                                                                                              request

                                                                                                              Each entity specifies a

                                                                                                              single reprojection task (ie

                                                                                                              a single tile)

                                                                                                              Query this table to get

                                                                                                              geo-metadata (eg

                                                                                                              boundaries) for each swath

                                                                                                              tile

                                                                                                              Query this table to get the

                                                                                                              list of satellite scan times

                                                                                                              that cover a target tile

                                                                                                              Swath Source

                                                                                                              Data Storage

                                                                                                              bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                                              bull Storage costs driven by data scale and 6 month project duration

                                                                                                              bull Small with respect to the people costs even at graduate student rates

                                                                                                              Reduction 1 Queue

                                                                                                              Source Metadata

                                                                                                              Request Queue

                                                                                                              Scientific Results Download

                                                                                                              Data Collection Stage

                                                                                                              Source Imagery Download Sites

                                                                                                              Reprojection Queue

                                                                                                              Reduction 2 Queue

                                                                                                              Download

                                                                                                              Queue

                                                                                                              Scientists

                                                                                                              Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                              400-500 GB

                                                                                                              60K files

                                                                                                              10 MBsec

                                                                                                              11 hours

                                                                                                              lt10 workers

                                                                                                              $50 upload

                                                                                                              $450 storage

                                                                                                              400 GB

                                                                                                              45K files

                                                                                                              3500 hours

                                                                                                              20-100

                                                                                                              workers

                                                                                                              5-7 GB

                                                                                                              55K files

                                                                                                              1800 hours

                                                                                                              20-100

                                                                                                              workers

                                                                                                              lt10 GB

                                                                                                              ~1K files

                                                                                                              1800 hours

                                                                                                              20-100

                                                                                                              workers

                                                                                                              $420 cpu

                                                                                                              $60 download

                                                                                                              $216 cpu

                                                                                                              $1 download

                                                                                                              $6 storage

                                                                                                              $216 cpu

                                                                                                              $2 download

                                                                                                              $9 storage

                                                                                                              AzureMODIS

                                                                                                              Service Web Role Portal

                                                                                                              Total $1420

                                                                                                              bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                                              potential to be important to both large and small scale science problems

                                                                                                              bull Equally import they can increase participation in research providing needed

                                                                                                              resources to userscommunities without ready access

                                                                                                              bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                                              support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                                              latency applications do not perform optimally on clouds today

                                                                                                              bull Provide valuable fault tolerance and scalability abstractions

                                                                                                              bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                                              bull Clouds services to support research provide considerable leverage for both

                                                                                                              individual researchers and entire communities of researchers

                                                                                                              • Day 2 - Azure as a PaaS
                                                                                                              • Day 2 - Applications

                                                                                                                56

                                                                                                                Code - TableEntities public void UpdateFrom(AttendeeEntity other) FirstName = otherFirstName LastName = otherLastName Email = otherEmail Birthday = otherBirthday FavoriteIceCream = otherFavoriteIceCream YearsInPhD = otherYearsInPhD Graduated = otherGraduated UpdateKeys() public void UpdateKeys() PartitionKey = SummerSchool RowKey = Email

                                                                                                                57

                                                                                                                Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

                                                                                                                58

                                                                                                                Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

                                                                                                                59

                                                                                                                Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

                                                                                                                60

                                                                                                                Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

                                                                                                                Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

                                                                                                                61

                                                                                                                Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

                                                                                                                62

                                                                                                                Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

                                                                                                                Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

                                                                                                                63

                                                                                                                Tools ndash Fiddler2

                                                                                                                Best Practices

                                                                                                                Picking the Right VM Size

                                                                                                                bull Having the correct VM size can make a big difference in costs

                                                                                                                bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                                                                                                                bull If you scale better than linear across cores larger VMs could save you money

                                                                                                                bull Pretty rare to see linear scaling across 8 cores

                                                                                                                bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                                                                                                                bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                                                                                                                Using Your VM to the Maximum

                                                                                                                Remember bull 1 role instance == 1 VM running Windows

                                                                                                                bull 1 role instance = one specific task for your code

                                                                                                                bull Yoursquore paying for the entire VM so why not use it

                                                                                                                bull Common mistake ndash split up code into multiple roles each not using up CPU

                                                                                                                bull Balance between using up CPU vs having free capacity in times of need

                                                                                                                bull Multiple ways to use your CPU to the fullest

                                                                                                                Exploiting Concurrency

                                                                                                                bull Spin up additional processes each with a specific task or as a unit of concurrency

                                                                                                                bull May not be ideal if number of active processes exceeds number of cores

                                                                                                                bull Use multithreading aggressively

                                                                                                                bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                                                                                                                bull In NET 4 use the Task Parallel Library

                                                                                                                bull Data parallelism

                                                                                                                bull Task parallelism

                                                                                                                Finding Good Code Neighbors

                                                                                                                bull Typically code falls into one or more of these categories

                                                                                                                bull Find code that is intensive with different resources to live together

                                                                                                                bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                                                                                                                Memory Intensive

                                                                                                                CPU Intensive

                                                                                                                Network IO Intensive

                                                                                                                Storage IO Intensive

                                                                                                                Scaling Appropriately

                                                                                                                bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                                                                                                                bull Spinning VMs up and down automatically is good at large scale

                                                                                                                bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                                                                                                                bull Being too aggressive in spinning down VMs can result in poor user experience

                                                                                                                bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                                                                                                                Performance Cost

                                                                                                                Storage Costs

                                                                                                                bull Understand an applicationrsquos storage profile and how storage billing works

                                                                                                                bull Make service choices based on your app profile

                                                                                                                bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                                                                                                                bull Service choice can make a big cost difference based on your app profile

                                                                                                                bull Caching and compressing They help a lot with storage costs

                                                                                                                Saving Bandwidth Costs

                                                                                                                Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                                                                                                Sending fewer things over the wire often means getting fewer things from storage

                                                                                                                Saving bandwidth costs often lead to savings in other places

                                                                                                                Sending fewer things means your VM has time to do other tasks

                                                                                                                All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                                                                                                Compressing Content

                                                                                                                1 Gzip all output content

                                                                                                                bull All modern browsers can decompress on the fly

                                                                                                                bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                                                                                                2Tradeoff compute costs for storage size

                                                                                                                3Minimize image sizes

                                                                                                                bull Use Portable Network Graphics (PNGs)

                                                                                                                bull Crush your PNGs

                                                                                                                bull Strip needless metadata

                                                                                                                bull Make all PNGs palette PNGs

                                                                                                                Uncompressed

                                                                                                                Content

                                                                                                                Compressed

                                                                                                                Content

                                                                                                                Gzip

                                                                                                                Minify JavaScript

                                                                                                                Minify CCS

                                                                                                                Minify Images

                                                                                                                Best Practices Summary

                                                                                                                Doing lsquolessrsquo is the key to saving costs

                                                                                                                Measure everything

                                                                                                                Know your application profile in and out

                                                                                                                Research Examples in the Cloud

                                                                                                                hellipon another set of slides

                                                                                                                Map Reduce on Azure

                                                                                                                bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                                                                                                bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                                                                                                76

                                                                                                                Project Daytona - Map Reduce on Azure

                                                                                                                httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                                                                                                77

                                                                                                                Questions and Discussionhellip

                                                                                                                Thank you for hosting me at the Summer School

                                                                                                                BLAST (Basic Local Alignment Search Tool)

                                                                                                                bull The most important software in bioinformatics

                                                                                                                bull Identify similarity between bio-sequences

                                                                                                                Computationally intensive

                                                                                                                bull Large number of pairwise alignment operations

                                                                                                                bull A BLAST running can take 700 ~ 1000 CPU hours

                                                                                                                bull Sequence databases growing exponentially

                                                                                                                bull GenBank doubled in size in about 15 months

                                                                                                                It is easy to parallelize BLAST

                                                                                                                bull Segment the input

                                                                                                                bull Segment processing (querying) is pleasingly parallel

                                                                                                                bull Segment the database (eg mpiBLAST)

                                                                                                                bull Needs special result reduction processing

                                                                                                                Large volume data

                                                                                                                bull A normal Blast database can be as large as 10GB

                                                                                                                bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                                                                                bull The output of BLAST is usually 10-100x larger than the input

                                                                                                                bull Parallel BLAST engine on Azure

                                                                                                                bull Query-segmentation data-parallel pattern bull split the input sequences

                                                                                                                bull query partitions in parallel

                                                                                                                bull merge results together when done

                                                                                                                bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                                                                                bull With three special considerations bull Batch job management

                                                                                                                bull Task parallelism on an elastic Cloud

                                                                                                                Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                                                                                on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                                                                                A simple SplitJoin pattern

                                                                                                                Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                                                                                bull 1248 for small middle large and extra large instance size

                                                                                                                Task granularity bull Large partition load imbalance

                                                                                                                bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                                                                                bull Data transferring overhead

                                                                                                                Best Practice test runs to profiling and set size to mitigate the overhead

                                                                                                                Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                                                                                bull too small repeated computation

                                                                                                                bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                                                                                bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                                                                                bull Watch out for the 2-hour maximum limitation

                                                                                                                BLAST task

                                                                                                                Splitting task

                                                                                                                BLAST task

                                                                                                                BLAST task

                                                                                                                BLAST task

                                                                                                                hellip

                                                                                                                Merging Task

                                                                                                                Task size vs Performance

                                                                                                                bull Benefit of the warm cache effect

                                                                                                                bull 100 sequences per partition is the best choice

                                                                                                                Instance size vs Performance

                                                                                                                bull Super-linear speedup with larger size worker instances

                                                                                                                bull Primarily due to the memory capability

                                                                                                                Task SizeInstance Size vs Cost

                                                                                                                bull Extra-large instance generated the best and the most economical throughput

                                                                                                                bull Fully utilize the resource

                                                                                                                Web

                                                                                                                Portal

                                                                                                                Web

                                                                                                                Service

                                                                                                                Job registration

                                                                                                                Job Scheduler

                                                                                                                Worker

                                                                                                                Worker

                                                                                                                Worker

                                                                                                                Global

                                                                                                                dispatch

                                                                                                                queue

                                                                                                                Web Role

                                                                                                                Azure Table

                                                                                                                Job Management Role

                                                                                                                Azure Blob

                                                                                                                Database

                                                                                                                updating Role

                                                                                                                hellip

                                                                                                                Scaling Engine

                                                                                                                Blast databases

                                                                                                                temporary data etc)

                                                                                                                Job Registry NCBI databases

                                                                                                                BLAST task

                                                                                                                Splitting task

                                                                                                                BLAST task

                                                                                                                BLAST task

                                                                                                                BLAST task

                                                                                                                hellip

                                                                                                                Merging Task

                                                                                                                ASPNET program hosted by a web role instance bull Submit jobs

                                                                                                                bull Track jobrsquos status and logs

                                                                                                                AuthenticationAuthorization based on Live ID

                                                                                                                The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                                                                                Web Portal

                                                                                                                Web Service

                                                                                                                Job registration

                                                                                                                Job Scheduler

                                                                                                                Job Portal

                                                                                                                Scaling Engine

                                                                                                                Job Registry

                                                                                                                R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                                                                                Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                                                                                bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                                                                                AzureBLAST significantly saved computing timehellip

                                                                                                                Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                                                                                ldquoAll against Allrdquo query bull The database is also the input query

                                                                                                                bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                                                                                bull Theoretically 100 billion sequence comparisons

                                                                                                                Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                                                                                bull Would require 3216731 minutes (61 years) on one desktop

                                                                                                                One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                                                                                bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                                                                bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                                                                bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                                                                bull Each segment consists of smaller partitions

                                                                                                                bull When load imbalances redistribute the load manually

                                                                                                                5

                                                                                                                0

                                                                                                                62 6

                                                                                                                2 6

                                                                                                                2 6

                                                                                                                2 6

                                                                                                                2 5

                                                                                                                0 62

                                                                                                                bull Total size of the output result is ~230GB

                                                                                                                bull The number of total hits is 1764579487

                                                                                                                bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                                                                bull But based our estimates real working instance time should be 6~8 day

                                                                                                                bull Look into log data to analyze what took placehellip

                                                                                                                5

                                                                                                                0

                                                                                                                62 6

                                                                                                                2 6

                                                                                                                2 6

                                                                                                                2 6

                                                                                                                2 5

                                                                                                                0 62

                                                                                                                A normal log record should be

                                                                                                                Otherwise something is wrong (eg task failed to complete)

                                                                                                                3312010 614 RD00155D3611B0 Executing the task 251523

                                                                                                                3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                                                                3312010 625 RD00155D3611B0 Executing the task 251553

                                                                                                                3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                                                                3312010 644 RD00155D3611B0 Executing the task 251600

                                                                                                                3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                                                                3312010 822 RD00155D3611B0 Executing the task 251774

                                                                                                                3312010 950 RD00155D3611B0 Executing the task 251895

                                                                                                                3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                                                                North Europe Data Center totally 34256 tasks processed

                                                                                                                All 62 compute nodes lost tasks

                                                                                                                and then came back in a group

                                                                                                                This is an Update domain

                                                                                                                ~30 mins

                                                                                                                ~ 6 nodes in one group

                                                                                                                35 Nodes experience blob

                                                                                                                writing failure at same

                                                                                                                time

                                                                                                                West Europe Datacenter 30976 tasks are completed and job was killed

                                                                                                                A reasonable guess the

                                                                                                                Fault Domain is working

                                                                                                                MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                                                                You never miss the water till the well has run dry Irish Proverb

                                                                                                                ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                                                                Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                                                                λv = Latent heat of vaporization (Jg)

                                                                                                                Rn = Net radiation (W m-2)

                                                                                                                cp = Specific heat capacity of air (J kg-1 K-1)

                                                                                                                ρa = dry air density (kg m-3)

                                                                                                                δq = vapor pressure deficit (Pa)

                                                                                                                ga = Conductivity of air (inverse of ra) (m s-1)

                                                                                                                gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                                                                γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                                                                Estimating resistanceconductivity across a

                                                                                                                catchment can be tricky

                                                                                                                bull Lots of inputs big data reduction

                                                                                                                bull Some of the inputs are not so simple

                                                                                                                119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                                                                (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                                                                Penman-Monteith (1964)

                                                                                                                Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                                                                NASA MODIS

                                                                                                                imagery source

                                                                                                                archives

                                                                                                                5 TB (600K files)

                                                                                                                FLUXNET curated

                                                                                                                sensor dataset

                                                                                                                (30GB 960 files)

                                                                                                                FLUXNET curated

                                                                                                                field dataset

                                                                                                                2 KB (1 file)

                                                                                                                NCEPNCAR

                                                                                                                ~100MB

                                                                                                                (4K files)

                                                                                                                Vegetative

                                                                                                                clumping

                                                                                                                ~5MB (1file)

                                                                                                                Climate

                                                                                                                classification

                                                                                                                ~1MB (1file)

                                                                                                                20 US year = 1 global year

                                                                                                                Data collection (map) stage

                                                                                                                bull Downloads requested input tiles

                                                                                                                from NASA ftp sites

                                                                                                                bull Includes geospatial lookup for

                                                                                                                non-sinusoidal tiles that will

                                                                                                                contribute to a reprojected

                                                                                                                sinusoidal tile

                                                                                                                Reprojection (map) stage

                                                                                                                bull Converts source tile(s) to

                                                                                                                intermediate result sinusoidal tiles

                                                                                                                bull Simple nearest neighbor or spline

                                                                                                                algorithms

                                                                                                                Derivation reduction stage

                                                                                                                bull First stage visible to scientist

                                                                                                                bull Computes ET in our initial use

                                                                                                                Analysis reduction stage

                                                                                                                bull Optional second stage visible to

                                                                                                                scientist

                                                                                                                bull Enables production of science

                                                                                                                analysis artifacts such as maps

                                                                                                                tables virtual sensors

                                                                                                                Reduction 1

                                                                                                                Queue

                                                                                                                Source

                                                                                                                Metadata

                                                                                                                AzureMODIS

                                                                                                                Service Web Role Portal

                                                                                                                Request

                                                                                                                Queue

                                                                                                                Scientific

                                                                                                                Results

                                                                                                                Download

                                                                                                                Data Collection Stage

                                                                                                                Source Imagery Download Sites

                                                                                                                Reprojection

                                                                                                                Queue

                                                                                                                Reduction 2

                                                                                                                Queue

                                                                                                                Download

                                                                                                                Queue

                                                                                                                Scientists

                                                                                                                Science results

                                                                                                                Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                                                bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                                                bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                                                bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                                                recoverable units of work

                                                                                                                bull Execution status of all jobs and tasks persisted in Tables

                                                                                                                ltPipelineStagegt

                                                                                                                Request

                                                                                                                hellip ltPipelineStagegtJobStatus

                                                                                                                Persist ltPipelineStagegtJob Queue

                                                                                                                MODISAzure Service

                                                                                                                (Web Role)

                                                                                                                Service Monitor

                                                                                                                (Worker Role)

                                                                                                                Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                hellip

                                                                                                                Dispatch

                                                                                                                ltPipelineStagegtTask Queue

                                                                                                                All work actually done by a Worker Role

                                                                                                                bull Sandboxes science or other executable

                                                                                                                bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                                                Service Monitor

                                                                                                                (Worker Role)

                                                                                                                Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                GenericWorker

                                                                                                                (Worker Role)

                                                                                                                hellip

                                                                                                                hellip

                                                                                                                Dispatch

                                                                                                                ltPipelineStagegtTask Queue

                                                                                                                hellip

                                                                                                                ltInputgtData Storage

                                                                                                                bull Dequeues tasks created by the Service Monitor

                                                                                                                bull Retries failed tasks 3 times

                                                                                                                bull Maintains all task status

                                                                                                                Reprojection Request

                                                                                                                hellip

                                                                                                                Service Monitor

                                                                                                                (Worker Role)

                                                                                                                ReprojectionJobStatus Persist

                                                                                                                Parse amp Persist ReprojectionTaskStatus

                                                                                                                GenericWorker

                                                                                                                (Worker Role)

                                                                                                                hellip

                                                                                                                Job Queue

                                                                                                                hellip

                                                                                                                Dispatch

                                                                                                                Task Queue

                                                                                                                Points to

                                                                                                                hellip

                                                                                                                ScanTimeList

                                                                                                                SwathGranuleMeta

                                                                                                                Reprojection Data

                                                                                                                Storage

                                                                                                                Each entity specifies a

                                                                                                                single reprojection job

                                                                                                                request

                                                                                                                Each entity specifies a

                                                                                                                single reprojection task (ie

                                                                                                                a single tile)

                                                                                                                Query this table to get

                                                                                                                geo-metadata (eg

                                                                                                                boundaries) for each swath

                                                                                                                tile

                                                                                                                Query this table to get the

                                                                                                                list of satellite scan times

                                                                                                                that cover a target tile

                                                                                                                Swath Source

                                                                                                                Data Storage

                                                                                                                bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                                                bull Storage costs driven by data scale and 6 month project duration

                                                                                                                bull Small with respect to the people costs even at graduate student rates

                                                                                                                Reduction 1 Queue

                                                                                                                Source Metadata

                                                                                                                Request Queue

                                                                                                                Scientific Results Download

                                                                                                                Data Collection Stage

                                                                                                                Source Imagery Download Sites

                                                                                                                Reprojection Queue

                                                                                                                Reduction 2 Queue

                                                                                                                Download

                                                                                                                Queue

                                                                                                                Scientists

                                                                                                                Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                400-500 GB

                                                                                                                60K files

                                                                                                                10 MBsec

                                                                                                                11 hours

                                                                                                                lt10 workers

                                                                                                                $50 upload

                                                                                                                $450 storage

                                                                                                                400 GB

                                                                                                                45K files

                                                                                                                3500 hours

                                                                                                                20-100

                                                                                                                workers

                                                                                                                5-7 GB

                                                                                                                55K files

                                                                                                                1800 hours

                                                                                                                20-100

                                                                                                                workers

                                                                                                                lt10 GB

                                                                                                                ~1K files

                                                                                                                1800 hours

                                                                                                                20-100

                                                                                                                workers

                                                                                                                $420 cpu

                                                                                                                $60 download

                                                                                                                $216 cpu

                                                                                                                $1 download

                                                                                                                $6 storage

                                                                                                                $216 cpu

                                                                                                                $2 download

                                                                                                                $9 storage

                                                                                                                AzureMODIS

                                                                                                                Service Web Role Portal

                                                                                                                Total $1420

                                                                                                                bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                                                potential to be important to both large and small scale science problems

                                                                                                                bull Equally import they can increase participation in research providing needed

                                                                                                                resources to userscommunities without ready access

                                                                                                                bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                                                support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                                                latency applications do not perform optimally on clouds today

                                                                                                                bull Provide valuable fault tolerance and scalability abstractions

                                                                                                                bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                                                bull Clouds services to support research provide considerable leverage for both

                                                                                                                individual researchers and entire communities of researchers

                                                                                                                • Day 2 - Azure as a PaaS
                                                                                                                • Day 2 - Applications

                                                                                                                  57

                                                                                                                  Code ndash TableHelpercs public class TableHelper private CloudTableClient client = null private TableServiceContext context = null private DictionaryltstringAttendeeEntitygt allAttendees = null private string tableName = Attendees private CloudTableClient Client get if (client == null) client = new CloudStorageAccount(AccountInformationCredentials false)CreateCloudTableClient() return client private TableServiceContext Context get if (context == null) context = ClientGetDataServiceContext() return context

                                                                                                                  58

                                                                                                                  Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

                                                                                                                  59

                                                                                                                  Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

                                                                                                                  60

                                                                                                                  Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

                                                                                                                  Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

                                                                                                                  61

                                                                                                                  Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

                                                                                                                  62

                                                                                                                  Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

                                                                                                                  Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

                                                                                                                  63

                                                                                                                  Tools ndash Fiddler2

                                                                                                                  Best Practices

                                                                                                                  Picking the Right VM Size

                                                                                                                  bull Having the correct VM size can make a big difference in costs

                                                                                                                  bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                                                                                                                  bull If you scale better than linear across cores larger VMs could save you money

                                                                                                                  bull Pretty rare to see linear scaling across 8 cores

                                                                                                                  bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                                                                                                                  bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                                                                                                                  Using Your VM to the Maximum

                                                                                                                  Remember bull 1 role instance == 1 VM running Windows

                                                                                                                  bull 1 role instance = one specific task for your code

                                                                                                                  bull Yoursquore paying for the entire VM so why not use it

                                                                                                                  bull Common mistake ndash split up code into multiple roles each not using up CPU

                                                                                                                  bull Balance between using up CPU vs having free capacity in times of need

                                                                                                                  bull Multiple ways to use your CPU to the fullest

                                                                                                                  Exploiting Concurrency

                                                                                                                  bull Spin up additional processes each with a specific task or as a unit of concurrency

                                                                                                                  bull May not be ideal if number of active processes exceeds number of cores

                                                                                                                  bull Use multithreading aggressively

                                                                                                                  bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                                                                                                                  bull In NET 4 use the Task Parallel Library

                                                                                                                  bull Data parallelism

                                                                                                                  bull Task parallelism

                                                                                                                  Finding Good Code Neighbors

                                                                                                                  bull Typically code falls into one or more of these categories

                                                                                                                  bull Find code that is intensive with different resources to live together

                                                                                                                  bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                                                                                                                  Memory Intensive

                                                                                                                  CPU Intensive

                                                                                                                  Network IO Intensive

                                                                                                                  Storage IO Intensive

                                                                                                                  Scaling Appropriately

                                                                                                                  bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                                                                                                                  bull Spinning VMs up and down automatically is good at large scale

                                                                                                                  bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                                                                                                                  bull Being too aggressive in spinning down VMs can result in poor user experience

                                                                                                                  bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                                                                                                                  Performance Cost

                                                                                                                  Storage Costs

                                                                                                                  bull Understand an applicationrsquos storage profile and how storage billing works

                                                                                                                  bull Make service choices based on your app profile

                                                                                                                  bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                                                                                                                  bull Service choice can make a big cost difference based on your app profile

                                                                                                                  bull Caching and compressing They help a lot with storage costs

                                                                                                                  Saving Bandwidth Costs

                                                                                                                  Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                                                                                                  Sending fewer things over the wire often means getting fewer things from storage

                                                                                                                  Saving bandwidth costs often lead to savings in other places

                                                                                                                  Sending fewer things means your VM has time to do other tasks

                                                                                                                  All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                                                                                                  Compressing Content

                                                                                                                  1 Gzip all output content

                                                                                                                  bull All modern browsers can decompress on the fly

                                                                                                                  bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                                                                                                  2Tradeoff compute costs for storage size

                                                                                                                  3Minimize image sizes

                                                                                                                  bull Use Portable Network Graphics (PNGs)

                                                                                                                  bull Crush your PNGs

                                                                                                                  bull Strip needless metadata

                                                                                                                  bull Make all PNGs palette PNGs

                                                                                                                  Uncompressed

                                                                                                                  Content

                                                                                                                  Compressed

                                                                                                                  Content

                                                                                                                  Gzip

                                                                                                                  Minify JavaScript

                                                                                                                  Minify CCS

                                                                                                                  Minify Images

                                                                                                                  Best Practices Summary

                                                                                                                  Doing lsquolessrsquo is the key to saving costs

                                                                                                                  Measure everything

                                                                                                                  Know your application profile in and out

                                                                                                                  Research Examples in the Cloud

                                                                                                                  hellipon another set of slides

                                                                                                                  Map Reduce on Azure

                                                                                                                  bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                                                                                                  bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                                                                                                  76

                                                                                                                  Project Daytona - Map Reduce on Azure

                                                                                                                  httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                                                                                                  77

                                                                                                                  Questions and Discussionhellip

                                                                                                                  Thank you for hosting me at the Summer School

                                                                                                                  BLAST (Basic Local Alignment Search Tool)

                                                                                                                  bull The most important software in bioinformatics

                                                                                                                  bull Identify similarity between bio-sequences

                                                                                                                  Computationally intensive

                                                                                                                  bull Large number of pairwise alignment operations

                                                                                                                  bull A BLAST running can take 700 ~ 1000 CPU hours

                                                                                                                  bull Sequence databases growing exponentially

                                                                                                                  bull GenBank doubled in size in about 15 months

                                                                                                                  It is easy to parallelize BLAST

                                                                                                                  bull Segment the input

                                                                                                                  bull Segment processing (querying) is pleasingly parallel

                                                                                                                  bull Segment the database (eg mpiBLAST)

                                                                                                                  bull Needs special result reduction processing

                                                                                                                  Large volume data

                                                                                                                  bull A normal Blast database can be as large as 10GB

                                                                                                                  bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                                                                                  bull The output of BLAST is usually 10-100x larger than the input

                                                                                                                  bull Parallel BLAST engine on Azure

                                                                                                                  bull Query-segmentation data-parallel pattern bull split the input sequences

                                                                                                                  bull query partitions in parallel

                                                                                                                  bull merge results together when done

                                                                                                                  bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                                                                                  bull With three special considerations bull Batch job management

                                                                                                                  bull Task parallelism on an elastic Cloud

                                                                                                                  Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                                                                                  on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                                                                                  A simple SplitJoin pattern

                                                                                                                  Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                                                                                  bull 1248 for small middle large and extra large instance size

                                                                                                                  Task granularity bull Large partition load imbalance

                                                                                                                  bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                                                                                  bull Data transferring overhead

                                                                                                                  Best Practice test runs to profiling and set size to mitigate the overhead

                                                                                                                  Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                                                                                  bull too small repeated computation

                                                                                                                  bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                                                                                  bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                                                                                  bull Watch out for the 2-hour maximum limitation

                                                                                                                  BLAST task

                                                                                                                  Splitting task

                                                                                                                  BLAST task

                                                                                                                  BLAST task

                                                                                                                  BLAST task

                                                                                                                  hellip

                                                                                                                  Merging Task

                                                                                                                  Task size vs Performance

                                                                                                                  bull Benefit of the warm cache effect

                                                                                                                  bull 100 sequences per partition is the best choice

                                                                                                                  Instance size vs Performance

                                                                                                                  bull Super-linear speedup with larger size worker instances

                                                                                                                  bull Primarily due to the memory capability

                                                                                                                  Task SizeInstance Size vs Cost

                                                                                                                  bull Extra-large instance generated the best and the most economical throughput

                                                                                                                  bull Fully utilize the resource

                                                                                                                  Web

                                                                                                                  Portal

                                                                                                                  Web

                                                                                                                  Service

                                                                                                                  Job registration

                                                                                                                  Job Scheduler

                                                                                                                  Worker

                                                                                                                  Worker

                                                                                                                  Worker

                                                                                                                  Global

                                                                                                                  dispatch

                                                                                                                  queue

                                                                                                                  Web Role

                                                                                                                  Azure Table

                                                                                                                  Job Management Role

                                                                                                                  Azure Blob

                                                                                                                  Database

                                                                                                                  updating Role

                                                                                                                  hellip

                                                                                                                  Scaling Engine

                                                                                                                  Blast databases

                                                                                                                  temporary data etc)

                                                                                                                  Job Registry NCBI databases

                                                                                                                  BLAST task

                                                                                                                  Splitting task

                                                                                                                  BLAST task

                                                                                                                  BLAST task

                                                                                                                  BLAST task

                                                                                                                  hellip

                                                                                                                  Merging Task

                                                                                                                  ASPNET program hosted by a web role instance bull Submit jobs

                                                                                                                  bull Track jobrsquos status and logs

                                                                                                                  AuthenticationAuthorization based on Live ID

                                                                                                                  The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                                                                                  Web Portal

                                                                                                                  Web Service

                                                                                                                  Job registration

                                                                                                                  Job Scheduler

                                                                                                                  Job Portal

                                                                                                                  Scaling Engine

                                                                                                                  Job Registry

                                                                                                                  R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                                                                                  Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                                                                                  bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                                                                                  AzureBLAST significantly saved computing timehellip

                                                                                                                  Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                                                                                  ldquoAll against Allrdquo query bull The database is also the input query

                                                                                                                  bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                                                                                  bull Theoretically 100 billion sequence comparisons

                                                                                                                  Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                                                                                  bull Would require 3216731 minutes (61 years) on one desktop

                                                                                                                  One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                                                                                  bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                                                                  bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                                                                  bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                                                                  bull Each segment consists of smaller partitions

                                                                                                                  bull When load imbalances redistribute the load manually

                                                                                                                  5

                                                                                                                  0

                                                                                                                  62 6

                                                                                                                  2 6

                                                                                                                  2 6

                                                                                                                  2 6

                                                                                                                  2 5

                                                                                                                  0 62

                                                                                                                  bull Total size of the output result is ~230GB

                                                                                                                  bull The number of total hits is 1764579487

                                                                                                                  bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                                                                  bull But based our estimates real working instance time should be 6~8 day

                                                                                                                  bull Look into log data to analyze what took placehellip

                                                                                                                  5

                                                                                                                  0

                                                                                                                  62 6

                                                                                                                  2 6

                                                                                                                  2 6

                                                                                                                  2 6

                                                                                                                  2 5

                                                                                                                  0 62

                                                                                                                  A normal log record should be

                                                                                                                  Otherwise something is wrong (eg task failed to complete)

                                                                                                                  3312010 614 RD00155D3611B0 Executing the task 251523

                                                                                                                  3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                                                                  3312010 625 RD00155D3611B0 Executing the task 251553

                                                                                                                  3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                                                                  3312010 644 RD00155D3611B0 Executing the task 251600

                                                                                                                  3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                                                                  3312010 822 RD00155D3611B0 Executing the task 251774

                                                                                                                  3312010 950 RD00155D3611B0 Executing the task 251895

                                                                                                                  3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                                                                  North Europe Data Center totally 34256 tasks processed

                                                                                                                  All 62 compute nodes lost tasks

                                                                                                                  and then came back in a group

                                                                                                                  This is an Update domain

                                                                                                                  ~30 mins

                                                                                                                  ~ 6 nodes in one group

                                                                                                                  35 Nodes experience blob

                                                                                                                  writing failure at same

                                                                                                                  time

                                                                                                                  West Europe Datacenter 30976 tasks are completed and job was killed

                                                                                                                  A reasonable guess the

                                                                                                                  Fault Domain is working

                                                                                                                  MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                                                                  You never miss the water till the well has run dry Irish Proverb

                                                                                                                  ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                                                                  Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                                                                  λv = Latent heat of vaporization (Jg)

                                                                                                                  Rn = Net radiation (W m-2)

                                                                                                                  cp = Specific heat capacity of air (J kg-1 K-1)

                                                                                                                  ρa = dry air density (kg m-3)

                                                                                                                  δq = vapor pressure deficit (Pa)

                                                                                                                  ga = Conductivity of air (inverse of ra) (m s-1)

                                                                                                                  gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                                                                  γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                                                                  Estimating resistanceconductivity across a

                                                                                                                  catchment can be tricky

                                                                                                                  bull Lots of inputs big data reduction

                                                                                                                  bull Some of the inputs are not so simple

                                                                                                                  119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                                                                  (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                                                                  Penman-Monteith (1964)

                                                                                                                  Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                                                                  NASA MODIS

                                                                                                                  imagery source

                                                                                                                  archives

                                                                                                                  5 TB (600K files)

                                                                                                                  FLUXNET curated

                                                                                                                  sensor dataset

                                                                                                                  (30GB 960 files)

                                                                                                                  FLUXNET curated

                                                                                                                  field dataset

                                                                                                                  2 KB (1 file)

                                                                                                                  NCEPNCAR

                                                                                                                  ~100MB

                                                                                                                  (4K files)

                                                                                                                  Vegetative

                                                                                                                  clumping

                                                                                                                  ~5MB (1file)

                                                                                                                  Climate

                                                                                                                  classification

                                                                                                                  ~1MB (1file)

                                                                                                                  20 US year = 1 global year

                                                                                                                  Data collection (map) stage

                                                                                                                  bull Downloads requested input tiles

                                                                                                                  from NASA ftp sites

                                                                                                                  bull Includes geospatial lookup for

                                                                                                                  non-sinusoidal tiles that will

                                                                                                                  contribute to a reprojected

                                                                                                                  sinusoidal tile

                                                                                                                  Reprojection (map) stage

                                                                                                                  bull Converts source tile(s) to

                                                                                                                  intermediate result sinusoidal tiles

                                                                                                                  bull Simple nearest neighbor or spline

                                                                                                                  algorithms

                                                                                                                  Derivation reduction stage

                                                                                                                  bull First stage visible to scientist

                                                                                                                  bull Computes ET in our initial use

                                                                                                                  Analysis reduction stage

                                                                                                                  bull Optional second stage visible to

                                                                                                                  scientist

                                                                                                                  bull Enables production of science

                                                                                                                  analysis artifacts such as maps

                                                                                                                  tables virtual sensors

                                                                                                                  Reduction 1

                                                                                                                  Queue

                                                                                                                  Source

                                                                                                                  Metadata

                                                                                                                  AzureMODIS

                                                                                                                  Service Web Role Portal

                                                                                                                  Request

                                                                                                                  Queue

                                                                                                                  Scientific

                                                                                                                  Results

                                                                                                                  Download

                                                                                                                  Data Collection Stage

                                                                                                                  Source Imagery Download Sites

                                                                                                                  Reprojection

                                                                                                                  Queue

                                                                                                                  Reduction 2

                                                                                                                  Queue

                                                                                                                  Download

                                                                                                                  Queue

                                                                                                                  Scientists

                                                                                                                  Science results

                                                                                                                  Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                  httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                                                  bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                                                  bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                                                  bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                                                  recoverable units of work

                                                                                                                  bull Execution status of all jobs and tasks persisted in Tables

                                                                                                                  ltPipelineStagegt

                                                                                                                  Request

                                                                                                                  hellip ltPipelineStagegtJobStatus

                                                                                                                  Persist ltPipelineStagegtJob Queue

                                                                                                                  MODISAzure Service

                                                                                                                  (Web Role)

                                                                                                                  Service Monitor

                                                                                                                  (Worker Role)

                                                                                                                  Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                  hellip

                                                                                                                  Dispatch

                                                                                                                  ltPipelineStagegtTask Queue

                                                                                                                  All work actually done by a Worker Role

                                                                                                                  bull Sandboxes science or other executable

                                                                                                                  bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                                                  Service Monitor

                                                                                                                  (Worker Role)

                                                                                                                  Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                  GenericWorker

                                                                                                                  (Worker Role)

                                                                                                                  hellip

                                                                                                                  hellip

                                                                                                                  Dispatch

                                                                                                                  ltPipelineStagegtTask Queue

                                                                                                                  hellip

                                                                                                                  ltInputgtData Storage

                                                                                                                  bull Dequeues tasks created by the Service Monitor

                                                                                                                  bull Retries failed tasks 3 times

                                                                                                                  bull Maintains all task status

                                                                                                                  Reprojection Request

                                                                                                                  hellip

                                                                                                                  Service Monitor

                                                                                                                  (Worker Role)

                                                                                                                  ReprojectionJobStatus Persist

                                                                                                                  Parse amp Persist ReprojectionTaskStatus

                                                                                                                  GenericWorker

                                                                                                                  (Worker Role)

                                                                                                                  hellip

                                                                                                                  Job Queue

                                                                                                                  hellip

                                                                                                                  Dispatch

                                                                                                                  Task Queue

                                                                                                                  Points to

                                                                                                                  hellip

                                                                                                                  ScanTimeList

                                                                                                                  SwathGranuleMeta

                                                                                                                  Reprojection Data

                                                                                                                  Storage

                                                                                                                  Each entity specifies a

                                                                                                                  single reprojection job

                                                                                                                  request

                                                                                                                  Each entity specifies a

                                                                                                                  single reprojection task (ie

                                                                                                                  a single tile)

                                                                                                                  Query this table to get

                                                                                                                  geo-metadata (eg

                                                                                                                  boundaries) for each swath

                                                                                                                  tile

                                                                                                                  Query this table to get the

                                                                                                                  list of satellite scan times

                                                                                                                  that cover a target tile

                                                                                                                  Swath Source

                                                                                                                  Data Storage

                                                                                                                  bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                                                  bull Storage costs driven by data scale and 6 month project duration

                                                                                                                  bull Small with respect to the people costs even at graduate student rates

                                                                                                                  Reduction 1 Queue

                                                                                                                  Source Metadata

                                                                                                                  Request Queue

                                                                                                                  Scientific Results Download

                                                                                                                  Data Collection Stage

                                                                                                                  Source Imagery Download Sites

                                                                                                                  Reprojection Queue

                                                                                                                  Reduction 2 Queue

                                                                                                                  Download

                                                                                                                  Queue

                                                                                                                  Scientists

                                                                                                                  Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                  400-500 GB

                                                                                                                  60K files

                                                                                                                  10 MBsec

                                                                                                                  11 hours

                                                                                                                  lt10 workers

                                                                                                                  $50 upload

                                                                                                                  $450 storage

                                                                                                                  400 GB

                                                                                                                  45K files

                                                                                                                  3500 hours

                                                                                                                  20-100

                                                                                                                  workers

                                                                                                                  5-7 GB

                                                                                                                  55K files

                                                                                                                  1800 hours

                                                                                                                  20-100

                                                                                                                  workers

                                                                                                                  lt10 GB

                                                                                                                  ~1K files

                                                                                                                  1800 hours

                                                                                                                  20-100

                                                                                                                  workers

                                                                                                                  $420 cpu

                                                                                                                  $60 download

                                                                                                                  $216 cpu

                                                                                                                  $1 download

                                                                                                                  $6 storage

                                                                                                                  $216 cpu

                                                                                                                  $2 download

                                                                                                                  $9 storage

                                                                                                                  AzureMODIS

                                                                                                                  Service Web Role Portal

                                                                                                                  Total $1420

                                                                                                                  bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                                                  potential to be important to both large and small scale science problems

                                                                                                                  bull Equally import they can increase participation in research providing needed

                                                                                                                  resources to userscommunities without ready access

                                                                                                                  bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                                                  support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                                                  latency applications do not perform optimally on clouds today

                                                                                                                  bull Provide valuable fault tolerance and scalability abstractions

                                                                                                                  bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                                                  bull Clouds services to support research provide considerable leverage for both

                                                                                                                  individual researchers and entire communities of researchers

                                                                                                                  • Day 2 - Azure as a PaaS
                                                                                                                  • Day 2 - Applications

                                                                                                                    58

                                                                                                                    Code ndash TableHelpercs private void ReadAllAttendees() allAttendees = new Dictionaryltstring AttendeeEntitygt() CloudTableQueryltAttendeeEntitygt query = ContextCreateQueryltAttendeeEntitygt(tableName)AsTableServiceQuery() try foreach (AttendeeEntity attendee in query) allAttendees[attendeeEmail] = attendee catch (Exception) No entries in table - or other exception

                                                                                                                    59

                                                                                                                    Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

                                                                                                                    60

                                                                                                                    Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

                                                                                                                    Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

                                                                                                                    61

                                                                                                                    Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

                                                                                                                    62

                                                                                                                    Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

                                                                                                                    Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

                                                                                                                    63

                                                                                                                    Tools ndash Fiddler2

                                                                                                                    Best Practices

                                                                                                                    Picking the Right VM Size

                                                                                                                    bull Having the correct VM size can make a big difference in costs

                                                                                                                    bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                                                                                                                    bull If you scale better than linear across cores larger VMs could save you money

                                                                                                                    bull Pretty rare to see linear scaling across 8 cores

                                                                                                                    bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                                                                                                                    bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                                                                                                                    Using Your VM to the Maximum

                                                                                                                    Remember bull 1 role instance == 1 VM running Windows

                                                                                                                    bull 1 role instance = one specific task for your code

                                                                                                                    bull Yoursquore paying for the entire VM so why not use it

                                                                                                                    bull Common mistake ndash split up code into multiple roles each not using up CPU

                                                                                                                    bull Balance between using up CPU vs having free capacity in times of need

                                                                                                                    bull Multiple ways to use your CPU to the fullest

                                                                                                                    Exploiting Concurrency

                                                                                                                    bull Spin up additional processes each with a specific task or as a unit of concurrency

                                                                                                                    bull May not be ideal if number of active processes exceeds number of cores

                                                                                                                    bull Use multithreading aggressively

                                                                                                                    bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                                                                                                                    bull In NET 4 use the Task Parallel Library

                                                                                                                    bull Data parallelism

                                                                                                                    bull Task parallelism

                                                                                                                    Finding Good Code Neighbors

                                                                                                                    bull Typically code falls into one or more of these categories

                                                                                                                    bull Find code that is intensive with different resources to live together

                                                                                                                    bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                                                                                                                    Memory Intensive

                                                                                                                    CPU Intensive

                                                                                                                    Network IO Intensive

                                                                                                                    Storage IO Intensive

                                                                                                                    Scaling Appropriately

                                                                                                                    bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                                                                                                                    bull Spinning VMs up and down automatically is good at large scale

                                                                                                                    bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                                                                                                                    bull Being too aggressive in spinning down VMs can result in poor user experience

                                                                                                                    bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                                                                                                                    Performance Cost

                                                                                                                    Storage Costs

                                                                                                                    bull Understand an applicationrsquos storage profile and how storage billing works

                                                                                                                    bull Make service choices based on your app profile

                                                                                                                    bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                                                                                                                    bull Service choice can make a big cost difference based on your app profile

                                                                                                                    bull Caching and compressing They help a lot with storage costs

                                                                                                                    Saving Bandwidth Costs

                                                                                                                    Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                                                                                                    Sending fewer things over the wire often means getting fewer things from storage

                                                                                                                    Saving bandwidth costs often lead to savings in other places

                                                                                                                    Sending fewer things means your VM has time to do other tasks

                                                                                                                    All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                                                                                                    Compressing Content

                                                                                                                    1 Gzip all output content

                                                                                                                    bull All modern browsers can decompress on the fly

                                                                                                                    bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                                                                                                    2Tradeoff compute costs for storage size

                                                                                                                    3Minimize image sizes

                                                                                                                    bull Use Portable Network Graphics (PNGs)

                                                                                                                    bull Crush your PNGs

                                                                                                                    bull Strip needless metadata

                                                                                                                    bull Make all PNGs palette PNGs

                                                                                                                    Uncompressed

                                                                                                                    Content

                                                                                                                    Compressed

                                                                                                                    Content

                                                                                                                    Gzip

                                                                                                                    Minify JavaScript

                                                                                                                    Minify CCS

                                                                                                                    Minify Images

                                                                                                                    Best Practices Summary

                                                                                                                    Doing lsquolessrsquo is the key to saving costs

                                                                                                                    Measure everything

                                                                                                                    Know your application profile in and out

                                                                                                                    Research Examples in the Cloud

                                                                                                                    hellipon another set of slides

                                                                                                                    Map Reduce on Azure

                                                                                                                    bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                                                                                                    bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                                                                                                    76

                                                                                                                    Project Daytona - Map Reduce on Azure

                                                                                                                    httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                                                                                                    77

                                                                                                                    Questions and Discussionhellip

                                                                                                                    Thank you for hosting me at the Summer School

                                                                                                                    BLAST (Basic Local Alignment Search Tool)

                                                                                                                    bull The most important software in bioinformatics

                                                                                                                    bull Identify similarity between bio-sequences

                                                                                                                    Computationally intensive

                                                                                                                    bull Large number of pairwise alignment operations

                                                                                                                    bull A BLAST running can take 700 ~ 1000 CPU hours

                                                                                                                    bull Sequence databases growing exponentially

                                                                                                                    bull GenBank doubled in size in about 15 months

                                                                                                                    It is easy to parallelize BLAST

                                                                                                                    bull Segment the input

                                                                                                                    bull Segment processing (querying) is pleasingly parallel

                                                                                                                    bull Segment the database (eg mpiBLAST)

                                                                                                                    bull Needs special result reduction processing

                                                                                                                    Large volume data

                                                                                                                    bull A normal Blast database can be as large as 10GB

                                                                                                                    bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                                                                                    bull The output of BLAST is usually 10-100x larger than the input

                                                                                                                    bull Parallel BLAST engine on Azure

                                                                                                                    bull Query-segmentation data-parallel pattern bull split the input sequences

                                                                                                                    bull query partitions in parallel

                                                                                                                    bull merge results together when done

                                                                                                                    bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                                                                                    bull With three special considerations bull Batch job management

                                                                                                                    bull Task parallelism on an elastic Cloud

                                                                                                                    Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                                                                                    on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                                                                                    A simple SplitJoin pattern

                                                                                                                    Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                                                                                    bull 1248 for small middle large and extra large instance size

                                                                                                                    Task granularity bull Large partition load imbalance

                                                                                                                    bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                                                                                    bull Data transferring overhead

                                                                                                                    Best Practice test runs to profiling and set size to mitigate the overhead

                                                                                                                    Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                                                                                    bull too small repeated computation

                                                                                                                    bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                                                                                    bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                                                                                    bull Watch out for the 2-hour maximum limitation

                                                                                                                    BLAST task

                                                                                                                    Splitting task

                                                                                                                    BLAST task

                                                                                                                    BLAST task

                                                                                                                    BLAST task

                                                                                                                    hellip

                                                                                                                    Merging Task

                                                                                                                    Task size vs Performance

                                                                                                                    bull Benefit of the warm cache effect

                                                                                                                    bull 100 sequences per partition is the best choice

                                                                                                                    Instance size vs Performance

                                                                                                                    bull Super-linear speedup with larger size worker instances

                                                                                                                    bull Primarily due to the memory capability

                                                                                                                    Task SizeInstance Size vs Cost

                                                                                                                    bull Extra-large instance generated the best and the most economical throughput

                                                                                                                    bull Fully utilize the resource

                                                                                                                    Web

                                                                                                                    Portal

                                                                                                                    Web

                                                                                                                    Service

                                                                                                                    Job registration

                                                                                                                    Job Scheduler

                                                                                                                    Worker

                                                                                                                    Worker

                                                                                                                    Worker

                                                                                                                    Global

                                                                                                                    dispatch

                                                                                                                    queue

                                                                                                                    Web Role

                                                                                                                    Azure Table

                                                                                                                    Job Management Role

                                                                                                                    Azure Blob

                                                                                                                    Database

                                                                                                                    updating Role

                                                                                                                    hellip

                                                                                                                    Scaling Engine

                                                                                                                    Blast databases

                                                                                                                    temporary data etc)

                                                                                                                    Job Registry NCBI databases

                                                                                                                    BLAST task

                                                                                                                    Splitting task

                                                                                                                    BLAST task

                                                                                                                    BLAST task

                                                                                                                    BLAST task

                                                                                                                    hellip

                                                                                                                    Merging Task

                                                                                                                    ASPNET program hosted by a web role instance bull Submit jobs

                                                                                                                    bull Track jobrsquos status and logs

                                                                                                                    AuthenticationAuthorization based on Live ID

                                                                                                                    The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                                                                                    Web Portal

                                                                                                                    Web Service

                                                                                                                    Job registration

                                                                                                                    Job Scheduler

                                                                                                                    Job Portal

                                                                                                                    Scaling Engine

                                                                                                                    Job Registry

                                                                                                                    R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                                                                                    Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                                                                                    bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                                                                                    AzureBLAST significantly saved computing timehellip

                                                                                                                    Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                                                                                    ldquoAll against Allrdquo query bull The database is also the input query

                                                                                                                    bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                                                                                    bull Theoretically 100 billion sequence comparisons

                                                                                                                    Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                                                                                    bull Would require 3216731 minutes (61 years) on one desktop

                                                                                                                    One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                                                                                    bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                                                                    bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                                                                    bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                                                                    bull Each segment consists of smaller partitions

                                                                                                                    bull When load imbalances redistribute the load manually

                                                                                                                    5

                                                                                                                    0

                                                                                                                    62 6

                                                                                                                    2 6

                                                                                                                    2 6

                                                                                                                    2 6

                                                                                                                    2 5

                                                                                                                    0 62

                                                                                                                    bull Total size of the output result is ~230GB

                                                                                                                    bull The number of total hits is 1764579487

                                                                                                                    bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                                                                    bull But based our estimates real working instance time should be 6~8 day

                                                                                                                    bull Look into log data to analyze what took placehellip

                                                                                                                    5

                                                                                                                    0

                                                                                                                    62 6

                                                                                                                    2 6

                                                                                                                    2 6

                                                                                                                    2 6

                                                                                                                    2 5

                                                                                                                    0 62

                                                                                                                    A normal log record should be

                                                                                                                    Otherwise something is wrong (eg task failed to complete)

                                                                                                                    3312010 614 RD00155D3611B0 Executing the task 251523

                                                                                                                    3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                                                                    3312010 625 RD00155D3611B0 Executing the task 251553

                                                                                                                    3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                                                                    3312010 644 RD00155D3611B0 Executing the task 251600

                                                                                                                    3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                                                                    3312010 822 RD00155D3611B0 Executing the task 251774

                                                                                                                    3312010 950 RD00155D3611B0 Executing the task 251895

                                                                                                                    3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                                                                    North Europe Data Center totally 34256 tasks processed

                                                                                                                    All 62 compute nodes lost tasks

                                                                                                                    and then came back in a group

                                                                                                                    This is an Update domain

                                                                                                                    ~30 mins

                                                                                                                    ~ 6 nodes in one group

                                                                                                                    35 Nodes experience blob

                                                                                                                    writing failure at same

                                                                                                                    time

                                                                                                                    West Europe Datacenter 30976 tasks are completed and job was killed

                                                                                                                    A reasonable guess the

                                                                                                                    Fault Domain is working

                                                                                                                    MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                                                                    You never miss the water till the well has run dry Irish Proverb

                                                                                                                    ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                                                                    Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                                                                    λv = Latent heat of vaporization (Jg)

                                                                                                                    Rn = Net radiation (W m-2)

                                                                                                                    cp = Specific heat capacity of air (J kg-1 K-1)

                                                                                                                    ρa = dry air density (kg m-3)

                                                                                                                    δq = vapor pressure deficit (Pa)

                                                                                                                    ga = Conductivity of air (inverse of ra) (m s-1)

                                                                                                                    gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                                                                    γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                                                                    Estimating resistanceconductivity across a

                                                                                                                    catchment can be tricky

                                                                                                                    bull Lots of inputs big data reduction

                                                                                                                    bull Some of the inputs are not so simple

                                                                                                                    119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                                                                    (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                                                                    Penman-Monteith (1964)

                                                                                                                    Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                                                                    NASA MODIS

                                                                                                                    imagery source

                                                                                                                    archives

                                                                                                                    5 TB (600K files)

                                                                                                                    FLUXNET curated

                                                                                                                    sensor dataset

                                                                                                                    (30GB 960 files)

                                                                                                                    FLUXNET curated

                                                                                                                    field dataset

                                                                                                                    2 KB (1 file)

                                                                                                                    NCEPNCAR

                                                                                                                    ~100MB

                                                                                                                    (4K files)

                                                                                                                    Vegetative

                                                                                                                    clumping

                                                                                                                    ~5MB (1file)

                                                                                                                    Climate

                                                                                                                    classification

                                                                                                                    ~1MB (1file)

                                                                                                                    20 US year = 1 global year

                                                                                                                    Data collection (map) stage

                                                                                                                    bull Downloads requested input tiles

                                                                                                                    from NASA ftp sites

                                                                                                                    bull Includes geospatial lookup for

                                                                                                                    non-sinusoidal tiles that will

                                                                                                                    contribute to a reprojected

                                                                                                                    sinusoidal tile

                                                                                                                    Reprojection (map) stage

                                                                                                                    bull Converts source tile(s) to

                                                                                                                    intermediate result sinusoidal tiles

                                                                                                                    bull Simple nearest neighbor or spline

                                                                                                                    algorithms

                                                                                                                    Derivation reduction stage

                                                                                                                    bull First stage visible to scientist

                                                                                                                    bull Computes ET in our initial use

                                                                                                                    Analysis reduction stage

                                                                                                                    bull Optional second stage visible to

                                                                                                                    scientist

                                                                                                                    bull Enables production of science

                                                                                                                    analysis artifacts such as maps

                                                                                                                    tables virtual sensors

                                                                                                                    Reduction 1

                                                                                                                    Queue

                                                                                                                    Source

                                                                                                                    Metadata

                                                                                                                    AzureMODIS

                                                                                                                    Service Web Role Portal

                                                                                                                    Request

                                                                                                                    Queue

                                                                                                                    Scientific

                                                                                                                    Results

                                                                                                                    Download

                                                                                                                    Data Collection Stage

                                                                                                                    Source Imagery Download Sites

                                                                                                                    Reprojection

                                                                                                                    Queue

                                                                                                                    Reduction 2

                                                                                                                    Queue

                                                                                                                    Download

                                                                                                                    Queue

                                                                                                                    Scientists

                                                                                                                    Science results

                                                                                                                    Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                    httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                                                    bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                                                    bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                                                    bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                                                    recoverable units of work

                                                                                                                    bull Execution status of all jobs and tasks persisted in Tables

                                                                                                                    ltPipelineStagegt

                                                                                                                    Request

                                                                                                                    hellip ltPipelineStagegtJobStatus

                                                                                                                    Persist ltPipelineStagegtJob Queue

                                                                                                                    MODISAzure Service

                                                                                                                    (Web Role)

                                                                                                                    Service Monitor

                                                                                                                    (Worker Role)

                                                                                                                    Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                    hellip

                                                                                                                    Dispatch

                                                                                                                    ltPipelineStagegtTask Queue

                                                                                                                    All work actually done by a Worker Role

                                                                                                                    bull Sandboxes science or other executable

                                                                                                                    bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                                                    Service Monitor

                                                                                                                    (Worker Role)

                                                                                                                    Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                    GenericWorker

                                                                                                                    (Worker Role)

                                                                                                                    hellip

                                                                                                                    hellip

                                                                                                                    Dispatch

                                                                                                                    ltPipelineStagegtTask Queue

                                                                                                                    hellip

                                                                                                                    ltInputgtData Storage

                                                                                                                    bull Dequeues tasks created by the Service Monitor

                                                                                                                    bull Retries failed tasks 3 times

                                                                                                                    bull Maintains all task status

                                                                                                                    Reprojection Request

                                                                                                                    hellip

                                                                                                                    Service Monitor

                                                                                                                    (Worker Role)

                                                                                                                    ReprojectionJobStatus Persist

                                                                                                                    Parse amp Persist ReprojectionTaskStatus

                                                                                                                    GenericWorker

                                                                                                                    (Worker Role)

                                                                                                                    hellip

                                                                                                                    Job Queue

                                                                                                                    hellip

                                                                                                                    Dispatch

                                                                                                                    Task Queue

                                                                                                                    Points to

                                                                                                                    hellip

                                                                                                                    ScanTimeList

                                                                                                                    SwathGranuleMeta

                                                                                                                    Reprojection Data

                                                                                                                    Storage

                                                                                                                    Each entity specifies a

                                                                                                                    single reprojection job

                                                                                                                    request

                                                                                                                    Each entity specifies a

                                                                                                                    single reprojection task (ie

                                                                                                                    a single tile)

                                                                                                                    Query this table to get

                                                                                                                    geo-metadata (eg

                                                                                                                    boundaries) for each swath

                                                                                                                    tile

                                                                                                                    Query this table to get the

                                                                                                                    list of satellite scan times

                                                                                                                    that cover a target tile

                                                                                                                    Swath Source

                                                                                                                    Data Storage

                                                                                                                    bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                                                    bull Storage costs driven by data scale and 6 month project duration

                                                                                                                    bull Small with respect to the people costs even at graduate student rates

                                                                                                                    Reduction 1 Queue

                                                                                                                    Source Metadata

                                                                                                                    Request Queue

                                                                                                                    Scientific Results Download

                                                                                                                    Data Collection Stage

                                                                                                                    Source Imagery Download Sites

                                                                                                                    Reprojection Queue

                                                                                                                    Reduction 2 Queue

                                                                                                                    Download

                                                                                                                    Queue

                                                                                                                    Scientists

                                                                                                                    Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                    400-500 GB

                                                                                                                    60K files

                                                                                                                    10 MBsec

                                                                                                                    11 hours

                                                                                                                    lt10 workers

                                                                                                                    $50 upload

                                                                                                                    $450 storage

                                                                                                                    400 GB

                                                                                                                    45K files

                                                                                                                    3500 hours

                                                                                                                    20-100

                                                                                                                    workers

                                                                                                                    5-7 GB

                                                                                                                    55K files

                                                                                                                    1800 hours

                                                                                                                    20-100

                                                                                                                    workers

                                                                                                                    lt10 GB

                                                                                                                    ~1K files

                                                                                                                    1800 hours

                                                                                                                    20-100

                                                                                                                    workers

                                                                                                                    $420 cpu

                                                                                                                    $60 download

                                                                                                                    $216 cpu

                                                                                                                    $1 download

                                                                                                                    $6 storage

                                                                                                                    $216 cpu

                                                                                                                    $2 download

                                                                                                                    $9 storage

                                                                                                                    AzureMODIS

                                                                                                                    Service Web Role Portal

                                                                                                                    Total $1420

                                                                                                                    bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                                                    potential to be important to both large and small scale science problems

                                                                                                                    bull Equally import they can increase participation in research providing needed

                                                                                                                    resources to userscommunities without ready access

                                                                                                                    bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                                                    support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                                                    latency applications do not perform optimally on clouds today

                                                                                                                    bull Provide valuable fault tolerance and scalability abstractions

                                                                                                                    bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                                                    bull Clouds services to support research provide considerable leverage for both

                                                                                                                    individual researchers and entire communities of researchers

                                                                                                                    • Day 2 - Azure as a PaaS
                                                                                                                    • Day 2 - Applications

                                                                                                                      59

                                                                                                                      Code ndash TableHelpercs public void DeleteAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return AttendeeEntity attendee = allAttendees[email] Delete from the cloud table ContextDeleteObject(attendee) ContextSaveChanges() Delete from the memory cache allAttendeesRemove(email)

                                                                                                                      60

                                                                                                                      Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

                                                                                                                      Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

                                                                                                                      61

                                                                                                                      Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

                                                                                                                      62

                                                                                                                      Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

                                                                                                                      Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

                                                                                                                      63

                                                                                                                      Tools ndash Fiddler2

                                                                                                                      Best Practices

                                                                                                                      Picking the Right VM Size

                                                                                                                      bull Having the correct VM size can make a big difference in costs

                                                                                                                      bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                                                                                                                      bull If you scale better than linear across cores larger VMs could save you money

                                                                                                                      bull Pretty rare to see linear scaling across 8 cores

                                                                                                                      bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                                                                                                                      bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                                                                                                                      Using Your VM to the Maximum

                                                                                                                      Remember bull 1 role instance == 1 VM running Windows

                                                                                                                      bull 1 role instance = one specific task for your code

                                                                                                                      bull Yoursquore paying for the entire VM so why not use it

                                                                                                                      bull Common mistake ndash split up code into multiple roles each not using up CPU

                                                                                                                      bull Balance between using up CPU vs having free capacity in times of need

                                                                                                                      bull Multiple ways to use your CPU to the fullest

                                                                                                                      Exploiting Concurrency

                                                                                                                      bull Spin up additional processes each with a specific task or as a unit of concurrency

                                                                                                                      bull May not be ideal if number of active processes exceeds number of cores

                                                                                                                      bull Use multithreading aggressively

                                                                                                                      bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                                                                                                                      bull In NET 4 use the Task Parallel Library

                                                                                                                      bull Data parallelism

                                                                                                                      bull Task parallelism

                                                                                                                      Finding Good Code Neighbors

                                                                                                                      bull Typically code falls into one or more of these categories

                                                                                                                      bull Find code that is intensive with different resources to live together

                                                                                                                      bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                                                                                                                      Memory Intensive

                                                                                                                      CPU Intensive

                                                                                                                      Network IO Intensive

                                                                                                                      Storage IO Intensive

                                                                                                                      Scaling Appropriately

                                                                                                                      bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                                                                                                                      bull Spinning VMs up and down automatically is good at large scale

                                                                                                                      bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                                                                                                                      bull Being too aggressive in spinning down VMs can result in poor user experience

                                                                                                                      bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                                                                                                                      Performance Cost

                                                                                                                      Storage Costs

                                                                                                                      bull Understand an applicationrsquos storage profile and how storage billing works

                                                                                                                      bull Make service choices based on your app profile

                                                                                                                      bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                                                                                                                      bull Service choice can make a big cost difference based on your app profile

                                                                                                                      bull Caching and compressing They help a lot with storage costs

                                                                                                                      Saving Bandwidth Costs

                                                                                                                      Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                                                                                                      Sending fewer things over the wire often means getting fewer things from storage

                                                                                                                      Saving bandwidth costs often lead to savings in other places

                                                                                                                      Sending fewer things means your VM has time to do other tasks

                                                                                                                      All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                                                                                                      Compressing Content

                                                                                                                      1 Gzip all output content

                                                                                                                      bull All modern browsers can decompress on the fly

                                                                                                                      bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                                                                                                      2Tradeoff compute costs for storage size

                                                                                                                      3Minimize image sizes

                                                                                                                      bull Use Portable Network Graphics (PNGs)

                                                                                                                      bull Crush your PNGs

                                                                                                                      bull Strip needless metadata

                                                                                                                      bull Make all PNGs palette PNGs

                                                                                                                      Uncompressed

                                                                                                                      Content

                                                                                                                      Compressed

                                                                                                                      Content

                                                                                                                      Gzip

                                                                                                                      Minify JavaScript

                                                                                                                      Minify CCS

                                                                                                                      Minify Images

                                                                                                                      Best Practices Summary

                                                                                                                      Doing lsquolessrsquo is the key to saving costs

                                                                                                                      Measure everything

                                                                                                                      Know your application profile in and out

                                                                                                                      Research Examples in the Cloud

                                                                                                                      hellipon another set of slides

                                                                                                                      Map Reduce on Azure

                                                                                                                      bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                                                                                                      bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                                                                                                      76

                                                                                                                      Project Daytona - Map Reduce on Azure

                                                                                                                      httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                                                                                                      77

                                                                                                                      Questions and Discussionhellip

                                                                                                                      Thank you for hosting me at the Summer School

                                                                                                                      BLAST (Basic Local Alignment Search Tool)

                                                                                                                      bull The most important software in bioinformatics

                                                                                                                      bull Identify similarity between bio-sequences

                                                                                                                      Computationally intensive

                                                                                                                      bull Large number of pairwise alignment operations

                                                                                                                      bull A BLAST running can take 700 ~ 1000 CPU hours

                                                                                                                      bull Sequence databases growing exponentially

                                                                                                                      bull GenBank doubled in size in about 15 months

                                                                                                                      It is easy to parallelize BLAST

                                                                                                                      bull Segment the input

                                                                                                                      bull Segment processing (querying) is pleasingly parallel

                                                                                                                      bull Segment the database (eg mpiBLAST)

                                                                                                                      bull Needs special result reduction processing

                                                                                                                      Large volume data

                                                                                                                      bull A normal Blast database can be as large as 10GB

                                                                                                                      bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                                                                                      bull The output of BLAST is usually 10-100x larger than the input

                                                                                                                      bull Parallel BLAST engine on Azure

                                                                                                                      bull Query-segmentation data-parallel pattern bull split the input sequences

                                                                                                                      bull query partitions in parallel

                                                                                                                      bull merge results together when done

                                                                                                                      bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                                                                                      bull With three special considerations bull Batch job management

                                                                                                                      bull Task parallelism on an elastic Cloud

                                                                                                                      Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                                                                                      on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                                                                                      A simple SplitJoin pattern

                                                                                                                      Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                                                                                      bull 1248 for small middle large and extra large instance size

                                                                                                                      Task granularity bull Large partition load imbalance

                                                                                                                      bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                                                                                      bull Data transferring overhead

                                                                                                                      Best Practice test runs to profiling and set size to mitigate the overhead

                                                                                                                      Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                                                                                      bull too small repeated computation

                                                                                                                      bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                                                                                      bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                                                                                      bull Watch out for the 2-hour maximum limitation

                                                                                                                      BLAST task

                                                                                                                      Splitting task

                                                                                                                      BLAST task

                                                                                                                      BLAST task

                                                                                                                      BLAST task

                                                                                                                      hellip

                                                                                                                      Merging Task

                                                                                                                      Task size vs Performance

                                                                                                                      bull Benefit of the warm cache effect

                                                                                                                      bull 100 sequences per partition is the best choice

                                                                                                                      Instance size vs Performance

                                                                                                                      bull Super-linear speedup with larger size worker instances

                                                                                                                      bull Primarily due to the memory capability

                                                                                                                      Task SizeInstance Size vs Cost

                                                                                                                      bull Extra-large instance generated the best and the most economical throughput

                                                                                                                      bull Fully utilize the resource

                                                                                                                      Web

                                                                                                                      Portal

                                                                                                                      Web

                                                                                                                      Service

                                                                                                                      Job registration

                                                                                                                      Job Scheduler

                                                                                                                      Worker

                                                                                                                      Worker

                                                                                                                      Worker

                                                                                                                      Global

                                                                                                                      dispatch

                                                                                                                      queue

                                                                                                                      Web Role

                                                                                                                      Azure Table

                                                                                                                      Job Management Role

                                                                                                                      Azure Blob

                                                                                                                      Database

                                                                                                                      updating Role

                                                                                                                      hellip

                                                                                                                      Scaling Engine

                                                                                                                      Blast databases

                                                                                                                      temporary data etc)

                                                                                                                      Job Registry NCBI databases

                                                                                                                      BLAST task

                                                                                                                      Splitting task

                                                                                                                      BLAST task

                                                                                                                      BLAST task

                                                                                                                      BLAST task

                                                                                                                      hellip

                                                                                                                      Merging Task

                                                                                                                      ASPNET program hosted by a web role instance bull Submit jobs

                                                                                                                      bull Track jobrsquos status and logs

                                                                                                                      AuthenticationAuthorization based on Live ID

                                                                                                                      The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                                                                                      Web Portal

                                                                                                                      Web Service

                                                                                                                      Job registration

                                                                                                                      Job Scheduler

                                                                                                                      Job Portal

                                                                                                                      Scaling Engine

                                                                                                                      Job Registry

                                                                                                                      R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                                                                                      Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                                                                                      bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                                                                                      AzureBLAST significantly saved computing timehellip

                                                                                                                      Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                                                                                      ldquoAll against Allrdquo query bull The database is also the input query

                                                                                                                      bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                                                                                      bull Theoretically 100 billion sequence comparisons

                                                                                                                      Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                                                                                      bull Would require 3216731 minutes (61 years) on one desktop

                                                                                                                      One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                                                                                      bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                                                                      bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                                                                      bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                                                                      bull Each segment consists of smaller partitions

                                                                                                                      bull When load imbalances redistribute the load manually

                                                                                                                      5

                                                                                                                      0

                                                                                                                      62 6

                                                                                                                      2 6

                                                                                                                      2 6

                                                                                                                      2 6

                                                                                                                      2 5

                                                                                                                      0 62

                                                                                                                      bull Total size of the output result is ~230GB

                                                                                                                      bull The number of total hits is 1764579487

                                                                                                                      bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                                                                      bull But based our estimates real working instance time should be 6~8 day

                                                                                                                      bull Look into log data to analyze what took placehellip

                                                                                                                      5

                                                                                                                      0

                                                                                                                      62 6

                                                                                                                      2 6

                                                                                                                      2 6

                                                                                                                      2 6

                                                                                                                      2 5

                                                                                                                      0 62

                                                                                                                      A normal log record should be

                                                                                                                      Otherwise something is wrong (eg task failed to complete)

                                                                                                                      3312010 614 RD00155D3611B0 Executing the task 251523

                                                                                                                      3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                                                                      3312010 625 RD00155D3611B0 Executing the task 251553

                                                                                                                      3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                                                                      3312010 644 RD00155D3611B0 Executing the task 251600

                                                                                                                      3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                                                                      3312010 822 RD00155D3611B0 Executing the task 251774

                                                                                                                      3312010 950 RD00155D3611B0 Executing the task 251895

                                                                                                                      3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                                                                      North Europe Data Center totally 34256 tasks processed

                                                                                                                      All 62 compute nodes lost tasks

                                                                                                                      and then came back in a group

                                                                                                                      This is an Update domain

                                                                                                                      ~30 mins

                                                                                                                      ~ 6 nodes in one group

                                                                                                                      35 Nodes experience blob

                                                                                                                      writing failure at same

                                                                                                                      time

                                                                                                                      West Europe Datacenter 30976 tasks are completed and job was killed

                                                                                                                      A reasonable guess the

                                                                                                                      Fault Domain is working

                                                                                                                      MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                                                                      You never miss the water till the well has run dry Irish Proverb

                                                                                                                      ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                                                                      Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                                                                      λv = Latent heat of vaporization (Jg)

                                                                                                                      Rn = Net radiation (W m-2)

                                                                                                                      cp = Specific heat capacity of air (J kg-1 K-1)

                                                                                                                      ρa = dry air density (kg m-3)

                                                                                                                      δq = vapor pressure deficit (Pa)

                                                                                                                      ga = Conductivity of air (inverse of ra) (m s-1)

                                                                                                                      gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                                                                      γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                                                                      Estimating resistanceconductivity across a

                                                                                                                      catchment can be tricky

                                                                                                                      bull Lots of inputs big data reduction

                                                                                                                      bull Some of the inputs are not so simple

                                                                                                                      119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                                                                      (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                                                                      Penman-Monteith (1964)

                                                                                                                      Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                                                                      NASA MODIS

                                                                                                                      imagery source

                                                                                                                      archives

                                                                                                                      5 TB (600K files)

                                                                                                                      FLUXNET curated

                                                                                                                      sensor dataset

                                                                                                                      (30GB 960 files)

                                                                                                                      FLUXNET curated

                                                                                                                      field dataset

                                                                                                                      2 KB (1 file)

                                                                                                                      NCEPNCAR

                                                                                                                      ~100MB

                                                                                                                      (4K files)

                                                                                                                      Vegetative

                                                                                                                      clumping

                                                                                                                      ~5MB (1file)

                                                                                                                      Climate

                                                                                                                      classification

                                                                                                                      ~1MB (1file)

                                                                                                                      20 US year = 1 global year

                                                                                                                      Data collection (map) stage

                                                                                                                      bull Downloads requested input tiles

                                                                                                                      from NASA ftp sites

                                                                                                                      bull Includes geospatial lookup for

                                                                                                                      non-sinusoidal tiles that will

                                                                                                                      contribute to a reprojected

                                                                                                                      sinusoidal tile

                                                                                                                      Reprojection (map) stage

                                                                                                                      bull Converts source tile(s) to

                                                                                                                      intermediate result sinusoidal tiles

                                                                                                                      bull Simple nearest neighbor or spline

                                                                                                                      algorithms

                                                                                                                      Derivation reduction stage

                                                                                                                      bull First stage visible to scientist

                                                                                                                      bull Computes ET in our initial use

                                                                                                                      Analysis reduction stage

                                                                                                                      bull Optional second stage visible to

                                                                                                                      scientist

                                                                                                                      bull Enables production of science

                                                                                                                      analysis artifacts such as maps

                                                                                                                      tables virtual sensors

                                                                                                                      Reduction 1

                                                                                                                      Queue

                                                                                                                      Source

                                                                                                                      Metadata

                                                                                                                      AzureMODIS

                                                                                                                      Service Web Role Portal

                                                                                                                      Request

                                                                                                                      Queue

                                                                                                                      Scientific

                                                                                                                      Results

                                                                                                                      Download

                                                                                                                      Data Collection Stage

                                                                                                                      Source Imagery Download Sites

                                                                                                                      Reprojection

                                                                                                                      Queue

                                                                                                                      Reduction 2

                                                                                                                      Queue

                                                                                                                      Download

                                                                                                                      Queue

                                                                                                                      Scientists

                                                                                                                      Science results

                                                                                                                      Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                      httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                                                      bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                                                      bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                                                      bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                                                      recoverable units of work

                                                                                                                      bull Execution status of all jobs and tasks persisted in Tables

                                                                                                                      ltPipelineStagegt

                                                                                                                      Request

                                                                                                                      hellip ltPipelineStagegtJobStatus

                                                                                                                      Persist ltPipelineStagegtJob Queue

                                                                                                                      MODISAzure Service

                                                                                                                      (Web Role)

                                                                                                                      Service Monitor

                                                                                                                      (Worker Role)

                                                                                                                      Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                      hellip

                                                                                                                      Dispatch

                                                                                                                      ltPipelineStagegtTask Queue

                                                                                                                      All work actually done by a Worker Role

                                                                                                                      bull Sandboxes science or other executable

                                                                                                                      bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                                                      Service Monitor

                                                                                                                      (Worker Role)

                                                                                                                      Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                      GenericWorker

                                                                                                                      (Worker Role)

                                                                                                                      hellip

                                                                                                                      hellip

                                                                                                                      Dispatch

                                                                                                                      ltPipelineStagegtTask Queue

                                                                                                                      hellip

                                                                                                                      ltInputgtData Storage

                                                                                                                      bull Dequeues tasks created by the Service Monitor

                                                                                                                      bull Retries failed tasks 3 times

                                                                                                                      bull Maintains all task status

                                                                                                                      Reprojection Request

                                                                                                                      hellip

                                                                                                                      Service Monitor

                                                                                                                      (Worker Role)

                                                                                                                      ReprojectionJobStatus Persist

                                                                                                                      Parse amp Persist ReprojectionTaskStatus

                                                                                                                      GenericWorker

                                                                                                                      (Worker Role)

                                                                                                                      hellip

                                                                                                                      Job Queue

                                                                                                                      hellip

                                                                                                                      Dispatch

                                                                                                                      Task Queue

                                                                                                                      Points to

                                                                                                                      hellip

                                                                                                                      ScanTimeList

                                                                                                                      SwathGranuleMeta

                                                                                                                      Reprojection Data

                                                                                                                      Storage

                                                                                                                      Each entity specifies a

                                                                                                                      single reprojection job

                                                                                                                      request

                                                                                                                      Each entity specifies a

                                                                                                                      single reprojection task (ie

                                                                                                                      a single tile)

                                                                                                                      Query this table to get

                                                                                                                      geo-metadata (eg

                                                                                                                      boundaries) for each swath

                                                                                                                      tile

                                                                                                                      Query this table to get the

                                                                                                                      list of satellite scan times

                                                                                                                      that cover a target tile

                                                                                                                      Swath Source

                                                                                                                      Data Storage

                                                                                                                      bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                                                      bull Storage costs driven by data scale and 6 month project duration

                                                                                                                      bull Small with respect to the people costs even at graduate student rates

                                                                                                                      Reduction 1 Queue

                                                                                                                      Source Metadata

                                                                                                                      Request Queue

                                                                                                                      Scientific Results Download

                                                                                                                      Data Collection Stage

                                                                                                                      Source Imagery Download Sites

                                                                                                                      Reprojection Queue

                                                                                                                      Reduction 2 Queue

                                                                                                                      Download

                                                                                                                      Queue

                                                                                                                      Scientists

                                                                                                                      Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                      400-500 GB

                                                                                                                      60K files

                                                                                                                      10 MBsec

                                                                                                                      11 hours

                                                                                                                      lt10 workers

                                                                                                                      $50 upload

                                                                                                                      $450 storage

                                                                                                                      400 GB

                                                                                                                      45K files

                                                                                                                      3500 hours

                                                                                                                      20-100

                                                                                                                      workers

                                                                                                                      5-7 GB

                                                                                                                      55K files

                                                                                                                      1800 hours

                                                                                                                      20-100

                                                                                                                      workers

                                                                                                                      lt10 GB

                                                                                                                      ~1K files

                                                                                                                      1800 hours

                                                                                                                      20-100

                                                                                                                      workers

                                                                                                                      $420 cpu

                                                                                                                      $60 download

                                                                                                                      $216 cpu

                                                                                                                      $1 download

                                                                                                                      $6 storage

                                                                                                                      $216 cpu

                                                                                                                      $2 download

                                                                                                                      $9 storage

                                                                                                                      AzureMODIS

                                                                                                                      Service Web Role Portal

                                                                                                                      Total $1420

                                                                                                                      bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                                                      potential to be important to both large and small scale science problems

                                                                                                                      bull Equally import they can increase participation in research providing needed

                                                                                                                      resources to userscommunities without ready access

                                                                                                                      bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                                                      support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                                                      latency applications do not perform optimally on clouds today

                                                                                                                      bull Provide valuable fault tolerance and scalability abstractions

                                                                                                                      bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                                                      bull Clouds services to support research provide considerable leverage for both

                                                                                                                      individual researchers and entire communities of researchers

                                                                                                                      • Day 2 - Azure as a PaaS
                                                                                                                      • Day 2 - Applications

                                                                                                                        60

                                                                                                                        Code ndash TableHelpercs public AttendeeEntity GetAttendee(string email) if (allAttendees == null) ReadAllAttendees() if (allAttendeesContainsKey(email)) return allAttendees[email] return null

                                                                                                                        Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables

                                                                                                                        61

                                                                                                                        Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

                                                                                                                        62

                                                                                                                        Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

                                                                                                                        Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

                                                                                                                        63

                                                                                                                        Tools ndash Fiddler2

                                                                                                                        Best Practices

                                                                                                                        Picking the Right VM Size

                                                                                                                        bull Having the correct VM size can make a big difference in costs

                                                                                                                        bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                                                                                                                        bull If you scale better than linear across cores larger VMs could save you money

                                                                                                                        bull Pretty rare to see linear scaling across 8 cores

                                                                                                                        bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                                                                                                                        bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                                                                                                                        Using Your VM to the Maximum

                                                                                                                        Remember bull 1 role instance == 1 VM running Windows

                                                                                                                        bull 1 role instance = one specific task for your code

                                                                                                                        bull Yoursquore paying for the entire VM so why not use it

                                                                                                                        bull Common mistake ndash split up code into multiple roles each not using up CPU

                                                                                                                        bull Balance between using up CPU vs having free capacity in times of need

                                                                                                                        bull Multiple ways to use your CPU to the fullest

                                                                                                                        Exploiting Concurrency

                                                                                                                        bull Spin up additional processes each with a specific task or as a unit of concurrency

                                                                                                                        bull May not be ideal if number of active processes exceeds number of cores

                                                                                                                        bull Use multithreading aggressively

                                                                                                                        bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                                                                                                                        bull In NET 4 use the Task Parallel Library

                                                                                                                        bull Data parallelism

                                                                                                                        bull Task parallelism

                                                                                                                        Finding Good Code Neighbors

                                                                                                                        bull Typically code falls into one or more of these categories

                                                                                                                        bull Find code that is intensive with different resources to live together

                                                                                                                        bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                                                                                                                        Memory Intensive

                                                                                                                        CPU Intensive

                                                                                                                        Network IO Intensive

                                                                                                                        Storage IO Intensive

                                                                                                                        Scaling Appropriately

                                                                                                                        bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                                                                                                                        bull Spinning VMs up and down automatically is good at large scale

                                                                                                                        bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                                                                                                                        bull Being too aggressive in spinning down VMs can result in poor user experience

                                                                                                                        bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                                                                                                                        Performance Cost

                                                                                                                        Storage Costs

                                                                                                                        bull Understand an applicationrsquos storage profile and how storage billing works

                                                                                                                        bull Make service choices based on your app profile

                                                                                                                        bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                                                                                                                        bull Service choice can make a big cost difference based on your app profile

                                                                                                                        bull Caching and compressing They help a lot with storage costs

                                                                                                                        Saving Bandwidth Costs

                                                                                                                        Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                                                                                                        Sending fewer things over the wire often means getting fewer things from storage

                                                                                                                        Saving bandwidth costs often lead to savings in other places

                                                                                                                        Sending fewer things means your VM has time to do other tasks

                                                                                                                        All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                                                                                                        Compressing Content

                                                                                                                        1 Gzip all output content

                                                                                                                        bull All modern browsers can decompress on the fly

                                                                                                                        bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                                                                                                        2Tradeoff compute costs for storage size

                                                                                                                        3Minimize image sizes

                                                                                                                        bull Use Portable Network Graphics (PNGs)

                                                                                                                        bull Crush your PNGs

                                                                                                                        bull Strip needless metadata

                                                                                                                        bull Make all PNGs palette PNGs

                                                                                                                        Uncompressed

                                                                                                                        Content

                                                                                                                        Compressed

                                                                                                                        Content

                                                                                                                        Gzip

                                                                                                                        Minify JavaScript

                                                                                                                        Minify CCS

                                                                                                                        Minify Images

                                                                                                                        Best Practices Summary

                                                                                                                        Doing lsquolessrsquo is the key to saving costs

                                                                                                                        Measure everything

                                                                                                                        Know your application profile in and out

                                                                                                                        Research Examples in the Cloud

                                                                                                                        hellipon another set of slides

                                                                                                                        Map Reduce on Azure

                                                                                                                        bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                                                                                                        bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                                                                                                        76

                                                                                                                        Project Daytona - Map Reduce on Azure

                                                                                                                        httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                                                                                                        77

                                                                                                                        Questions and Discussionhellip

                                                                                                                        Thank you for hosting me at the Summer School

                                                                                                                        BLAST (Basic Local Alignment Search Tool)

                                                                                                                        bull The most important software in bioinformatics

                                                                                                                        bull Identify similarity between bio-sequences

                                                                                                                        Computationally intensive

                                                                                                                        bull Large number of pairwise alignment operations

                                                                                                                        bull A BLAST running can take 700 ~ 1000 CPU hours

                                                                                                                        bull Sequence databases growing exponentially

                                                                                                                        bull GenBank doubled in size in about 15 months

                                                                                                                        It is easy to parallelize BLAST

                                                                                                                        bull Segment the input

                                                                                                                        bull Segment processing (querying) is pleasingly parallel

                                                                                                                        bull Segment the database (eg mpiBLAST)

                                                                                                                        bull Needs special result reduction processing

                                                                                                                        Large volume data

                                                                                                                        bull A normal Blast database can be as large as 10GB

                                                                                                                        bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                                                                                        bull The output of BLAST is usually 10-100x larger than the input

                                                                                                                        bull Parallel BLAST engine on Azure

                                                                                                                        bull Query-segmentation data-parallel pattern bull split the input sequences

                                                                                                                        bull query partitions in parallel

                                                                                                                        bull merge results together when done

                                                                                                                        bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                                                                                        bull With three special considerations bull Batch job management

                                                                                                                        bull Task parallelism on an elastic Cloud

                                                                                                                        Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                                                                                        on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                                                                                        A simple SplitJoin pattern

                                                                                                                        Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                                                                                        bull 1248 for small middle large and extra large instance size

                                                                                                                        Task granularity bull Large partition load imbalance

                                                                                                                        bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                                                                                        bull Data transferring overhead

                                                                                                                        Best Practice test runs to profiling and set size to mitigate the overhead

                                                                                                                        Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                                                                                        bull too small repeated computation

                                                                                                                        bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                                                                                        bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                                                                                        bull Watch out for the 2-hour maximum limitation

                                                                                                                        BLAST task

                                                                                                                        Splitting task

                                                                                                                        BLAST task

                                                                                                                        BLAST task

                                                                                                                        BLAST task

                                                                                                                        hellip

                                                                                                                        Merging Task

                                                                                                                        Task size vs Performance

                                                                                                                        bull Benefit of the warm cache effect

                                                                                                                        bull 100 sequences per partition is the best choice

                                                                                                                        Instance size vs Performance

                                                                                                                        bull Super-linear speedup with larger size worker instances

                                                                                                                        bull Primarily due to the memory capability

                                                                                                                        Task SizeInstance Size vs Cost

                                                                                                                        bull Extra-large instance generated the best and the most economical throughput

                                                                                                                        bull Fully utilize the resource

                                                                                                                        Web

                                                                                                                        Portal

                                                                                                                        Web

                                                                                                                        Service

                                                                                                                        Job registration

                                                                                                                        Job Scheduler

                                                                                                                        Worker

                                                                                                                        Worker

                                                                                                                        Worker

                                                                                                                        Global

                                                                                                                        dispatch

                                                                                                                        queue

                                                                                                                        Web Role

                                                                                                                        Azure Table

                                                                                                                        Job Management Role

                                                                                                                        Azure Blob

                                                                                                                        Database

                                                                                                                        updating Role

                                                                                                                        hellip

                                                                                                                        Scaling Engine

                                                                                                                        Blast databases

                                                                                                                        temporary data etc)

                                                                                                                        Job Registry NCBI databases

                                                                                                                        BLAST task

                                                                                                                        Splitting task

                                                                                                                        BLAST task

                                                                                                                        BLAST task

                                                                                                                        BLAST task

                                                                                                                        hellip

                                                                                                                        Merging Task

                                                                                                                        ASPNET program hosted by a web role instance bull Submit jobs

                                                                                                                        bull Track jobrsquos status and logs

                                                                                                                        AuthenticationAuthorization based on Live ID

                                                                                                                        The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                                                                                        Web Portal

                                                                                                                        Web Service

                                                                                                                        Job registration

                                                                                                                        Job Scheduler

                                                                                                                        Job Portal

                                                                                                                        Scaling Engine

                                                                                                                        Job Registry

                                                                                                                        R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                                                                                        Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                                                                                        bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                                                                                        AzureBLAST significantly saved computing timehellip

                                                                                                                        Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                                                                                        ldquoAll against Allrdquo query bull The database is also the input query

                                                                                                                        bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                                                                                        bull Theoretically 100 billion sequence comparisons

                                                                                                                        Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                                                                                        bull Would require 3216731 minutes (61 years) on one desktop

                                                                                                                        One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                                                                                        bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                                                                        bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                                                                        bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                                                                        bull Each segment consists of smaller partitions

                                                                                                                        bull When load imbalances redistribute the load manually

                                                                                                                        5

                                                                                                                        0

                                                                                                                        62 6

                                                                                                                        2 6

                                                                                                                        2 6

                                                                                                                        2 6

                                                                                                                        2 5

                                                                                                                        0 62

                                                                                                                        bull Total size of the output result is ~230GB

                                                                                                                        bull The number of total hits is 1764579487

                                                                                                                        bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                                                                        bull But based our estimates real working instance time should be 6~8 day

                                                                                                                        bull Look into log data to analyze what took placehellip

                                                                                                                        5

                                                                                                                        0

                                                                                                                        62 6

                                                                                                                        2 6

                                                                                                                        2 6

                                                                                                                        2 6

                                                                                                                        2 5

                                                                                                                        0 62

                                                                                                                        A normal log record should be

                                                                                                                        Otherwise something is wrong (eg task failed to complete)

                                                                                                                        3312010 614 RD00155D3611B0 Executing the task 251523

                                                                                                                        3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                                                                        3312010 625 RD00155D3611B0 Executing the task 251553

                                                                                                                        3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                                                                        3312010 644 RD00155D3611B0 Executing the task 251600

                                                                                                                        3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                                                                        3312010 822 RD00155D3611B0 Executing the task 251774

                                                                                                                        3312010 950 RD00155D3611B0 Executing the task 251895

                                                                                                                        3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                                                                        North Europe Data Center totally 34256 tasks processed

                                                                                                                        All 62 compute nodes lost tasks

                                                                                                                        and then came back in a group

                                                                                                                        This is an Update domain

                                                                                                                        ~30 mins

                                                                                                                        ~ 6 nodes in one group

                                                                                                                        35 Nodes experience blob

                                                                                                                        writing failure at same

                                                                                                                        time

                                                                                                                        West Europe Datacenter 30976 tasks are completed and job was killed

                                                                                                                        A reasonable guess the

                                                                                                                        Fault Domain is working

                                                                                                                        MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                                                                        You never miss the water till the well has run dry Irish Proverb

                                                                                                                        ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                                                                        Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                                                                        λv = Latent heat of vaporization (Jg)

                                                                                                                        Rn = Net radiation (W m-2)

                                                                                                                        cp = Specific heat capacity of air (J kg-1 K-1)

                                                                                                                        ρa = dry air density (kg m-3)

                                                                                                                        δq = vapor pressure deficit (Pa)

                                                                                                                        ga = Conductivity of air (inverse of ra) (m s-1)

                                                                                                                        gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                                                                        γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                                                                        Estimating resistanceconductivity across a

                                                                                                                        catchment can be tricky

                                                                                                                        bull Lots of inputs big data reduction

                                                                                                                        bull Some of the inputs are not so simple

                                                                                                                        119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                                                                        (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                                                                        Penman-Monteith (1964)

                                                                                                                        Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                                                                        NASA MODIS

                                                                                                                        imagery source

                                                                                                                        archives

                                                                                                                        5 TB (600K files)

                                                                                                                        FLUXNET curated

                                                                                                                        sensor dataset

                                                                                                                        (30GB 960 files)

                                                                                                                        FLUXNET curated

                                                                                                                        field dataset

                                                                                                                        2 KB (1 file)

                                                                                                                        NCEPNCAR

                                                                                                                        ~100MB

                                                                                                                        (4K files)

                                                                                                                        Vegetative

                                                                                                                        clumping

                                                                                                                        ~5MB (1file)

                                                                                                                        Climate

                                                                                                                        classification

                                                                                                                        ~1MB (1file)

                                                                                                                        20 US year = 1 global year

                                                                                                                        Data collection (map) stage

                                                                                                                        bull Downloads requested input tiles

                                                                                                                        from NASA ftp sites

                                                                                                                        bull Includes geospatial lookup for

                                                                                                                        non-sinusoidal tiles that will

                                                                                                                        contribute to a reprojected

                                                                                                                        sinusoidal tile

                                                                                                                        Reprojection (map) stage

                                                                                                                        bull Converts source tile(s) to

                                                                                                                        intermediate result sinusoidal tiles

                                                                                                                        bull Simple nearest neighbor or spline

                                                                                                                        algorithms

                                                                                                                        Derivation reduction stage

                                                                                                                        bull First stage visible to scientist

                                                                                                                        bull Computes ET in our initial use

                                                                                                                        Analysis reduction stage

                                                                                                                        bull Optional second stage visible to

                                                                                                                        scientist

                                                                                                                        bull Enables production of science

                                                                                                                        analysis artifacts such as maps

                                                                                                                        tables virtual sensors

                                                                                                                        Reduction 1

                                                                                                                        Queue

                                                                                                                        Source

                                                                                                                        Metadata

                                                                                                                        AzureMODIS

                                                                                                                        Service Web Role Portal

                                                                                                                        Request

                                                                                                                        Queue

                                                                                                                        Scientific

                                                                                                                        Results

                                                                                                                        Download

                                                                                                                        Data Collection Stage

                                                                                                                        Source Imagery Download Sites

                                                                                                                        Reprojection

                                                                                                                        Queue

                                                                                                                        Reduction 2

                                                                                                                        Queue

                                                                                                                        Download

                                                                                                                        Queue

                                                                                                                        Scientists

                                                                                                                        Science results

                                                                                                                        Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                        httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                                                        bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                                                        bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                                                        bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                                                        recoverable units of work

                                                                                                                        bull Execution status of all jobs and tasks persisted in Tables

                                                                                                                        ltPipelineStagegt

                                                                                                                        Request

                                                                                                                        hellip ltPipelineStagegtJobStatus

                                                                                                                        Persist ltPipelineStagegtJob Queue

                                                                                                                        MODISAzure Service

                                                                                                                        (Web Role)

                                                                                                                        Service Monitor

                                                                                                                        (Worker Role)

                                                                                                                        Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                        hellip

                                                                                                                        Dispatch

                                                                                                                        ltPipelineStagegtTask Queue

                                                                                                                        All work actually done by a Worker Role

                                                                                                                        bull Sandboxes science or other executable

                                                                                                                        bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                                                        Service Monitor

                                                                                                                        (Worker Role)

                                                                                                                        Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                        GenericWorker

                                                                                                                        (Worker Role)

                                                                                                                        hellip

                                                                                                                        hellip

                                                                                                                        Dispatch

                                                                                                                        ltPipelineStagegtTask Queue

                                                                                                                        hellip

                                                                                                                        ltInputgtData Storage

                                                                                                                        bull Dequeues tasks created by the Service Monitor

                                                                                                                        bull Retries failed tasks 3 times

                                                                                                                        bull Maintains all task status

                                                                                                                        Reprojection Request

                                                                                                                        hellip

                                                                                                                        Service Monitor

                                                                                                                        (Worker Role)

                                                                                                                        ReprojectionJobStatus Persist

                                                                                                                        Parse amp Persist ReprojectionTaskStatus

                                                                                                                        GenericWorker

                                                                                                                        (Worker Role)

                                                                                                                        hellip

                                                                                                                        Job Queue

                                                                                                                        hellip

                                                                                                                        Dispatch

                                                                                                                        Task Queue

                                                                                                                        Points to

                                                                                                                        hellip

                                                                                                                        ScanTimeList

                                                                                                                        SwathGranuleMeta

                                                                                                                        Reprojection Data

                                                                                                                        Storage

                                                                                                                        Each entity specifies a

                                                                                                                        single reprojection job

                                                                                                                        request

                                                                                                                        Each entity specifies a

                                                                                                                        single reprojection task (ie

                                                                                                                        a single tile)

                                                                                                                        Query this table to get

                                                                                                                        geo-metadata (eg

                                                                                                                        boundaries) for each swath

                                                                                                                        tile

                                                                                                                        Query this table to get the

                                                                                                                        list of satellite scan times

                                                                                                                        that cover a target tile

                                                                                                                        Swath Source

                                                                                                                        Data Storage

                                                                                                                        bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                                                        bull Storage costs driven by data scale and 6 month project duration

                                                                                                                        bull Small with respect to the people costs even at graduate student rates

                                                                                                                        Reduction 1 Queue

                                                                                                                        Source Metadata

                                                                                                                        Request Queue

                                                                                                                        Scientific Results Download

                                                                                                                        Data Collection Stage

                                                                                                                        Source Imagery Download Sites

                                                                                                                        Reprojection Queue

                                                                                                                        Reduction 2 Queue

                                                                                                                        Download

                                                                                                                        Queue

                                                                                                                        Scientists

                                                                                                                        Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                        400-500 GB

                                                                                                                        60K files

                                                                                                                        10 MBsec

                                                                                                                        11 hours

                                                                                                                        lt10 workers

                                                                                                                        $50 upload

                                                                                                                        $450 storage

                                                                                                                        400 GB

                                                                                                                        45K files

                                                                                                                        3500 hours

                                                                                                                        20-100

                                                                                                                        workers

                                                                                                                        5-7 GB

                                                                                                                        55K files

                                                                                                                        1800 hours

                                                                                                                        20-100

                                                                                                                        workers

                                                                                                                        lt10 GB

                                                                                                                        ~1K files

                                                                                                                        1800 hours

                                                                                                                        20-100

                                                                                                                        workers

                                                                                                                        $420 cpu

                                                                                                                        $60 download

                                                                                                                        $216 cpu

                                                                                                                        $1 download

                                                                                                                        $6 storage

                                                                                                                        $216 cpu

                                                                                                                        $2 download

                                                                                                                        $9 storage

                                                                                                                        AzureMODIS

                                                                                                                        Service Web Role Portal

                                                                                                                        Total $1420

                                                                                                                        bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                                                        potential to be important to both large and small scale science problems

                                                                                                                        bull Equally import they can increase participation in research providing needed

                                                                                                                        resources to userscommunities without ready access

                                                                                                                        bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                                                        support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                                                        latency applications do not perform optimally on clouds today

                                                                                                                        bull Provide valuable fault tolerance and scalability abstractions

                                                                                                                        bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                                                        bull Clouds services to support research provide considerable leverage for both

                                                                                                                        individual researchers and entire communities of researchers

                                                                                                                        • Day 2 - Azure as a PaaS
                                                                                                                        • Day 2 - Applications

                                                                                                                          61

                                                                                                                          Pseudo Code ndash TableHelpercs public void UpdateAttendees(ListltAttendeeEntitygt updatedAttendees) foreach (AttendeeEntity attendee in updatedAttendees) UpdateAttendee(attendee false) ContextSaveChanges(SaveChangesOptionsBatch) public void UpdateAttendee(AttendeeEntity attendee) UpdateAttendee(attendee true) private void UpdateAttendee(AttendeeEntity attendee bool saveChanges) if (allAttendeesContainsKey(attendeeEmail)) AttendeeEntity existingAttendee = allAttendees[attendeeEmail] existingAttendeeUpdateFrom(attendee) ContextUpdateObject(existingAttendee) else ContextAddObject(tableName attendee) if (saveChanges) ContextSaveChanges()

                                                                                                                          62

                                                                                                                          Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

                                                                                                                          Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

                                                                                                                          63

                                                                                                                          Tools ndash Fiddler2

                                                                                                                          Best Practices

                                                                                                                          Picking the Right VM Size

                                                                                                                          bull Having the correct VM size can make a big difference in costs

                                                                                                                          bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                                                                                                                          bull If you scale better than linear across cores larger VMs could save you money

                                                                                                                          bull Pretty rare to see linear scaling across 8 cores

                                                                                                                          bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                                                                                                                          bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                                                                                                                          Using Your VM to the Maximum

                                                                                                                          Remember bull 1 role instance == 1 VM running Windows

                                                                                                                          bull 1 role instance = one specific task for your code

                                                                                                                          bull Yoursquore paying for the entire VM so why not use it

                                                                                                                          bull Common mistake ndash split up code into multiple roles each not using up CPU

                                                                                                                          bull Balance between using up CPU vs having free capacity in times of need

                                                                                                                          bull Multiple ways to use your CPU to the fullest

                                                                                                                          Exploiting Concurrency

                                                                                                                          bull Spin up additional processes each with a specific task or as a unit of concurrency

                                                                                                                          bull May not be ideal if number of active processes exceeds number of cores

                                                                                                                          bull Use multithreading aggressively

                                                                                                                          bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                                                                                                                          bull In NET 4 use the Task Parallel Library

                                                                                                                          bull Data parallelism

                                                                                                                          bull Task parallelism

                                                                                                                          Finding Good Code Neighbors

                                                                                                                          bull Typically code falls into one or more of these categories

                                                                                                                          bull Find code that is intensive with different resources to live together

                                                                                                                          bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                                                                                                                          Memory Intensive

                                                                                                                          CPU Intensive

                                                                                                                          Network IO Intensive

                                                                                                                          Storage IO Intensive

                                                                                                                          Scaling Appropriately

                                                                                                                          bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                                                                                                                          bull Spinning VMs up and down automatically is good at large scale

                                                                                                                          bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                                                                                                                          bull Being too aggressive in spinning down VMs can result in poor user experience

                                                                                                                          bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                                                                                                                          Performance Cost

                                                                                                                          Storage Costs

                                                                                                                          bull Understand an applicationrsquos storage profile and how storage billing works

                                                                                                                          bull Make service choices based on your app profile

                                                                                                                          bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                                                                                                                          bull Service choice can make a big cost difference based on your app profile

                                                                                                                          bull Caching and compressing They help a lot with storage costs

                                                                                                                          Saving Bandwidth Costs

                                                                                                                          Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                                                                                                          Sending fewer things over the wire often means getting fewer things from storage

                                                                                                                          Saving bandwidth costs often lead to savings in other places

                                                                                                                          Sending fewer things means your VM has time to do other tasks

                                                                                                                          All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                                                                                                          Compressing Content

                                                                                                                          1 Gzip all output content

                                                                                                                          bull All modern browsers can decompress on the fly

                                                                                                                          bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                                                                                                          2Tradeoff compute costs for storage size

                                                                                                                          3Minimize image sizes

                                                                                                                          bull Use Portable Network Graphics (PNGs)

                                                                                                                          bull Crush your PNGs

                                                                                                                          bull Strip needless metadata

                                                                                                                          bull Make all PNGs palette PNGs

                                                                                                                          Uncompressed

                                                                                                                          Content

                                                                                                                          Compressed

                                                                                                                          Content

                                                                                                                          Gzip

                                                                                                                          Minify JavaScript

                                                                                                                          Minify CCS

                                                                                                                          Minify Images

                                                                                                                          Best Practices Summary

                                                                                                                          Doing lsquolessrsquo is the key to saving costs

                                                                                                                          Measure everything

                                                                                                                          Know your application profile in and out

                                                                                                                          Research Examples in the Cloud

                                                                                                                          hellipon another set of slides

                                                                                                                          Map Reduce on Azure

                                                                                                                          bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                                                                                                          bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                                                                                                          76

                                                                                                                          Project Daytona - Map Reduce on Azure

                                                                                                                          httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                                                                                                          77

                                                                                                                          Questions and Discussionhellip

                                                                                                                          Thank you for hosting me at the Summer School

                                                                                                                          BLAST (Basic Local Alignment Search Tool)

                                                                                                                          bull The most important software in bioinformatics

                                                                                                                          bull Identify similarity between bio-sequences

                                                                                                                          Computationally intensive

                                                                                                                          bull Large number of pairwise alignment operations

                                                                                                                          bull A BLAST running can take 700 ~ 1000 CPU hours

                                                                                                                          bull Sequence databases growing exponentially

                                                                                                                          bull GenBank doubled in size in about 15 months

                                                                                                                          It is easy to parallelize BLAST

                                                                                                                          bull Segment the input

                                                                                                                          bull Segment processing (querying) is pleasingly parallel

                                                                                                                          bull Segment the database (eg mpiBLAST)

                                                                                                                          bull Needs special result reduction processing

                                                                                                                          Large volume data

                                                                                                                          bull A normal Blast database can be as large as 10GB

                                                                                                                          bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                                                                                          bull The output of BLAST is usually 10-100x larger than the input

                                                                                                                          bull Parallel BLAST engine on Azure

                                                                                                                          bull Query-segmentation data-parallel pattern bull split the input sequences

                                                                                                                          bull query partitions in parallel

                                                                                                                          bull merge results together when done

                                                                                                                          bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                                                                                          bull With three special considerations bull Batch job management

                                                                                                                          bull Task parallelism on an elastic Cloud

                                                                                                                          Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                                                                                          on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                                                                                          A simple SplitJoin pattern

                                                                                                                          Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                                                                                          bull 1248 for small middle large and extra large instance size

                                                                                                                          Task granularity bull Large partition load imbalance

                                                                                                                          bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                                                                                          bull Data transferring overhead

                                                                                                                          Best Practice test runs to profiling and set size to mitigate the overhead

                                                                                                                          Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                                                                                          bull too small repeated computation

                                                                                                                          bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                                                                                          bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                                                                                          bull Watch out for the 2-hour maximum limitation

                                                                                                                          BLAST task

                                                                                                                          Splitting task

                                                                                                                          BLAST task

                                                                                                                          BLAST task

                                                                                                                          BLAST task

                                                                                                                          hellip

                                                                                                                          Merging Task

                                                                                                                          Task size vs Performance

                                                                                                                          bull Benefit of the warm cache effect

                                                                                                                          bull 100 sequences per partition is the best choice

                                                                                                                          Instance size vs Performance

                                                                                                                          bull Super-linear speedup with larger size worker instances

                                                                                                                          bull Primarily due to the memory capability

                                                                                                                          Task SizeInstance Size vs Cost

                                                                                                                          bull Extra-large instance generated the best and the most economical throughput

                                                                                                                          bull Fully utilize the resource

                                                                                                                          Web

                                                                                                                          Portal

                                                                                                                          Web

                                                                                                                          Service

                                                                                                                          Job registration

                                                                                                                          Job Scheduler

                                                                                                                          Worker

                                                                                                                          Worker

                                                                                                                          Worker

                                                                                                                          Global

                                                                                                                          dispatch

                                                                                                                          queue

                                                                                                                          Web Role

                                                                                                                          Azure Table

                                                                                                                          Job Management Role

                                                                                                                          Azure Blob

                                                                                                                          Database

                                                                                                                          updating Role

                                                                                                                          hellip

                                                                                                                          Scaling Engine

                                                                                                                          Blast databases

                                                                                                                          temporary data etc)

                                                                                                                          Job Registry NCBI databases

                                                                                                                          BLAST task

                                                                                                                          Splitting task

                                                                                                                          BLAST task

                                                                                                                          BLAST task

                                                                                                                          BLAST task

                                                                                                                          hellip

                                                                                                                          Merging Task

                                                                                                                          ASPNET program hosted by a web role instance bull Submit jobs

                                                                                                                          bull Track jobrsquos status and logs

                                                                                                                          AuthenticationAuthorization based on Live ID

                                                                                                                          The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                                                                                          Web Portal

                                                                                                                          Web Service

                                                                                                                          Job registration

                                                                                                                          Job Scheduler

                                                                                                                          Job Portal

                                                                                                                          Scaling Engine

                                                                                                                          Job Registry

                                                                                                                          R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                                                                                          Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                                                                                          bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                                                                                          AzureBLAST significantly saved computing timehellip

                                                                                                                          Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                                                                                          ldquoAll against Allrdquo query bull The database is also the input query

                                                                                                                          bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                                                                                          bull Theoretically 100 billion sequence comparisons

                                                                                                                          Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                                                                                          bull Would require 3216731 minutes (61 years) on one desktop

                                                                                                                          One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                                                                                          bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                                                                          bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                                                                          bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                                                                          bull Each segment consists of smaller partitions

                                                                                                                          bull When load imbalances redistribute the load manually

                                                                                                                          5

                                                                                                                          0

                                                                                                                          62 6

                                                                                                                          2 6

                                                                                                                          2 6

                                                                                                                          2 6

                                                                                                                          2 5

                                                                                                                          0 62

                                                                                                                          bull Total size of the output result is ~230GB

                                                                                                                          bull The number of total hits is 1764579487

                                                                                                                          bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                                                                          bull But based our estimates real working instance time should be 6~8 day

                                                                                                                          bull Look into log data to analyze what took placehellip

                                                                                                                          5

                                                                                                                          0

                                                                                                                          62 6

                                                                                                                          2 6

                                                                                                                          2 6

                                                                                                                          2 6

                                                                                                                          2 5

                                                                                                                          0 62

                                                                                                                          A normal log record should be

                                                                                                                          Otherwise something is wrong (eg task failed to complete)

                                                                                                                          3312010 614 RD00155D3611B0 Executing the task 251523

                                                                                                                          3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                                                                          3312010 625 RD00155D3611B0 Executing the task 251553

                                                                                                                          3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                                                                          3312010 644 RD00155D3611B0 Executing the task 251600

                                                                                                                          3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                                                                          3312010 822 RD00155D3611B0 Executing the task 251774

                                                                                                                          3312010 950 RD00155D3611B0 Executing the task 251895

                                                                                                                          3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                                                                          North Europe Data Center totally 34256 tasks processed

                                                                                                                          All 62 compute nodes lost tasks

                                                                                                                          and then came back in a group

                                                                                                                          This is an Update domain

                                                                                                                          ~30 mins

                                                                                                                          ~ 6 nodes in one group

                                                                                                                          35 Nodes experience blob

                                                                                                                          writing failure at same

                                                                                                                          time

                                                                                                                          West Europe Datacenter 30976 tasks are completed and job was killed

                                                                                                                          A reasonable guess the

                                                                                                                          Fault Domain is working

                                                                                                                          MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                                                                          You never miss the water till the well has run dry Irish Proverb

                                                                                                                          ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                                                                          Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                                                                          λv = Latent heat of vaporization (Jg)

                                                                                                                          Rn = Net radiation (W m-2)

                                                                                                                          cp = Specific heat capacity of air (J kg-1 K-1)

                                                                                                                          ρa = dry air density (kg m-3)

                                                                                                                          δq = vapor pressure deficit (Pa)

                                                                                                                          ga = Conductivity of air (inverse of ra) (m s-1)

                                                                                                                          gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                                                                          γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                                                                          Estimating resistanceconductivity across a

                                                                                                                          catchment can be tricky

                                                                                                                          bull Lots of inputs big data reduction

                                                                                                                          bull Some of the inputs are not so simple

                                                                                                                          119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                                                                          (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                                                                          Penman-Monteith (1964)

                                                                                                                          Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                                                                          NASA MODIS

                                                                                                                          imagery source

                                                                                                                          archives

                                                                                                                          5 TB (600K files)

                                                                                                                          FLUXNET curated

                                                                                                                          sensor dataset

                                                                                                                          (30GB 960 files)

                                                                                                                          FLUXNET curated

                                                                                                                          field dataset

                                                                                                                          2 KB (1 file)

                                                                                                                          NCEPNCAR

                                                                                                                          ~100MB

                                                                                                                          (4K files)

                                                                                                                          Vegetative

                                                                                                                          clumping

                                                                                                                          ~5MB (1file)

                                                                                                                          Climate

                                                                                                                          classification

                                                                                                                          ~1MB (1file)

                                                                                                                          20 US year = 1 global year

                                                                                                                          Data collection (map) stage

                                                                                                                          bull Downloads requested input tiles

                                                                                                                          from NASA ftp sites

                                                                                                                          bull Includes geospatial lookup for

                                                                                                                          non-sinusoidal tiles that will

                                                                                                                          contribute to a reprojected

                                                                                                                          sinusoidal tile

                                                                                                                          Reprojection (map) stage

                                                                                                                          bull Converts source tile(s) to

                                                                                                                          intermediate result sinusoidal tiles

                                                                                                                          bull Simple nearest neighbor or spline

                                                                                                                          algorithms

                                                                                                                          Derivation reduction stage

                                                                                                                          bull First stage visible to scientist

                                                                                                                          bull Computes ET in our initial use

                                                                                                                          Analysis reduction stage

                                                                                                                          bull Optional second stage visible to

                                                                                                                          scientist

                                                                                                                          bull Enables production of science

                                                                                                                          analysis artifacts such as maps

                                                                                                                          tables virtual sensors

                                                                                                                          Reduction 1

                                                                                                                          Queue

                                                                                                                          Source

                                                                                                                          Metadata

                                                                                                                          AzureMODIS

                                                                                                                          Service Web Role Portal

                                                                                                                          Request

                                                                                                                          Queue

                                                                                                                          Scientific

                                                                                                                          Results

                                                                                                                          Download

                                                                                                                          Data Collection Stage

                                                                                                                          Source Imagery Download Sites

                                                                                                                          Reprojection

                                                                                                                          Queue

                                                                                                                          Reduction 2

                                                                                                                          Queue

                                                                                                                          Download

                                                                                                                          Queue

                                                                                                                          Scientists

                                                                                                                          Science results

                                                                                                                          Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                          httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                                                          bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                                                          bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                                                          bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                                                          recoverable units of work

                                                                                                                          bull Execution status of all jobs and tasks persisted in Tables

                                                                                                                          ltPipelineStagegt

                                                                                                                          Request

                                                                                                                          hellip ltPipelineStagegtJobStatus

                                                                                                                          Persist ltPipelineStagegtJob Queue

                                                                                                                          MODISAzure Service

                                                                                                                          (Web Role)

                                                                                                                          Service Monitor

                                                                                                                          (Worker Role)

                                                                                                                          Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                          hellip

                                                                                                                          Dispatch

                                                                                                                          ltPipelineStagegtTask Queue

                                                                                                                          All work actually done by a Worker Role

                                                                                                                          bull Sandboxes science or other executable

                                                                                                                          bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                                                          Service Monitor

                                                                                                                          (Worker Role)

                                                                                                                          Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                          GenericWorker

                                                                                                                          (Worker Role)

                                                                                                                          hellip

                                                                                                                          hellip

                                                                                                                          Dispatch

                                                                                                                          ltPipelineStagegtTask Queue

                                                                                                                          hellip

                                                                                                                          ltInputgtData Storage

                                                                                                                          bull Dequeues tasks created by the Service Monitor

                                                                                                                          bull Retries failed tasks 3 times

                                                                                                                          bull Maintains all task status

                                                                                                                          Reprojection Request

                                                                                                                          hellip

                                                                                                                          Service Monitor

                                                                                                                          (Worker Role)

                                                                                                                          ReprojectionJobStatus Persist

                                                                                                                          Parse amp Persist ReprojectionTaskStatus

                                                                                                                          GenericWorker

                                                                                                                          (Worker Role)

                                                                                                                          hellip

                                                                                                                          Job Queue

                                                                                                                          hellip

                                                                                                                          Dispatch

                                                                                                                          Task Queue

                                                                                                                          Points to

                                                                                                                          hellip

                                                                                                                          ScanTimeList

                                                                                                                          SwathGranuleMeta

                                                                                                                          Reprojection Data

                                                                                                                          Storage

                                                                                                                          Each entity specifies a

                                                                                                                          single reprojection job

                                                                                                                          request

                                                                                                                          Each entity specifies a

                                                                                                                          single reprojection task (ie

                                                                                                                          a single tile)

                                                                                                                          Query this table to get

                                                                                                                          geo-metadata (eg

                                                                                                                          boundaries) for each swath

                                                                                                                          tile

                                                                                                                          Query this table to get the

                                                                                                                          list of satellite scan times

                                                                                                                          that cover a target tile

                                                                                                                          Swath Source

                                                                                                                          Data Storage

                                                                                                                          bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                                                          bull Storage costs driven by data scale and 6 month project duration

                                                                                                                          bull Small with respect to the people costs even at graduate student rates

                                                                                                                          Reduction 1 Queue

                                                                                                                          Source Metadata

                                                                                                                          Request Queue

                                                                                                                          Scientific Results Download

                                                                                                                          Data Collection Stage

                                                                                                                          Source Imagery Download Sites

                                                                                                                          Reprojection Queue

                                                                                                                          Reduction 2 Queue

                                                                                                                          Download

                                                                                                                          Queue

                                                                                                                          Scientists

                                                                                                                          Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                          400-500 GB

                                                                                                                          60K files

                                                                                                                          10 MBsec

                                                                                                                          11 hours

                                                                                                                          lt10 workers

                                                                                                                          $50 upload

                                                                                                                          $450 storage

                                                                                                                          400 GB

                                                                                                                          45K files

                                                                                                                          3500 hours

                                                                                                                          20-100

                                                                                                                          workers

                                                                                                                          5-7 GB

                                                                                                                          55K files

                                                                                                                          1800 hours

                                                                                                                          20-100

                                                                                                                          workers

                                                                                                                          lt10 GB

                                                                                                                          ~1K files

                                                                                                                          1800 hours

                                                                                                                          20-100

                                                                                                                          workers

                                                                                                                          $420 cpu

                                                                                                                          $60 download

                                                                                                                          $216 cpu

                                                                                                                          $1 download

                                                                                                                          $6 storage

                                                                                                                          $216 cpu

                                                                                                                          $2 download

                                                                                                                          $9 storage

                                                                                                                          AzureMODIS

                                                                                                                          Service Web Role Portal

                                                                                                                          Total $1420

                                                                                                                          bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                                                          potential to be important to both large and small scale science problems

                                                                                                                          bull Equally import they can increase participation in research providing needed

                                                                                                                          resources to userscommunities without ready access

                                                                                                                          bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                                                          support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                                                          latency applications do not perform optimally on clouds today

                                                                                                                          bull Provide valuable fault tolerance and scalability abstractions

                                                                                                                          bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                                                          bull Clouds services to support research provide considerable leverage for both

                                                                                                                          individual researchers and entire communities of researchers

                                                                                                                          • Day 2 - Azure as a PaaS
                                                                                                                          • Day 2 - Applications

                                                                                                                            62

                                                                                                                            Application Code ndash Cloud Tables private void SaveButton_Click(object sender RoutedEventArgs e) Write to table tableHelperUpdateAttendees(attendees)

                                                                                                                            Thatrsquos it Now your tables are accessible using REST service calls or any cloud storage tool

                                                                                                                            63

                                                                                                                            Tools ndash Fiddler2

                                                                                                                            Best Practices

                                                                                                                            Picking the Right VM Size

                                                                                                                            bull Having the correct VM size can make a big difference in costs

                                                                                                                            bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                                                                                                                            bull If you scale better than linear across cores larger VMs could save you money

                                                                                                                            bull Pretty rare to see linear scaling across 8 cores

                                                                                                                            bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                                                                                                                            bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                                                                                                                            Using Your VM to the Maximum

                                                                                                                            Remember bull 1 role instance == 1 VM running Windows

                                                                                                                            bull 1 role instance = one specific task for your code

                                                                                                                            bull Yoursquore paying for the entire VM so why not use it

                                                                                                                            bull Common mistake ndash split up code into multiple roles each not using up CPU

                                                                                                                            bull Balance between using up CPU vs having free capacity in times of need

                                                                                                                            bull Multiple ways to use your CPU to the fullest

                                                                                                                            Exploiting Concurrency

                                                                                                                            bull Spin up additional processes each with a specific task or as a unit of concurrency

                                                                                                                            bull May not be ideal if number of active processes exceeds number of cores

                                                                                                                            bull Use multithreading aggressively

                                                                                                                            bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                                                                                                                            bull In NET 4 use the Task Parallel Library

                                                                                                                            bull Data parallelism

                                                                                                                            bull Task parallelism

                                                                                                                            Finding Good Code Neighbors

                                                                                                                            bull Typically code falls into one or more of these categories

                                                                                                                            bull Find code that is intensive with different resources to live together

                                                                                                                            bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                                                                                                                            Memory Intensive

                                                                                                                            CPU Intensive

                                                                                                                            Network IO Intensive

                                                                                                                            Storage IO Intensive

                                                                                                                            Scaling Appropriately

                                                                                                                            bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                                                                                                                            bull Spinning VMs up and down automatically is good at large scale

                                                                                                                            bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                                                                                                                            bull Being too aggressive in spinning down VMs can result in poor user experience

                                                                                                                            bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                                                                                                                            Performance Cost

                                                                                                                            Storage Costs

                                                                                                                            bull Understand an applicationrsquos storage profile and how storage billing works

                                                                                                                            bull Make service choices based on your app profile

                                                                                                                            bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                                                                                                                            bull Service choice can make a big cost difference based on your app profile

                                                                                                                            bull Caching and compressing They help a lot with storage costs

                                                                                                                            Saving Bandwidth Costs

                                                                                                                            Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                                                                                                            Sending fewer things over the wire often means getting fewer things from storage

                                                                                                                            Saving bandwidth costs often lead to savings in other places

                                                                                                                            Sending fewer things means your VM has time to do other tasks

                                                                                                                            All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                                                                                                            Compressing Content

                                                                                                                            1 Gzip all output content

                                                                                                                            bull All modern browsers can decompress on the fly

                                                                                                                            bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                                                                                                            2Tradeoff compute costs for storage size

                                                                                                                            3Minimize image sizes

                                                                                                                            bull Use Portable Network Graphics (PNGs)

                                                                                                                            bull Crush your PNGs

                                                                                                                            bull Strip needless metadata

                                                                                                                            bull Make all PNGs palette PNGs

                                                                                                                            Uncompressed

                                                                                                                            Content

                                                                                                                            Compressed

                                                                                                                            Content

                                                                                                                            Gzip

                                                                                                                            Minify JavaScript

                                                                                                                            Minify CCS

                                                                                                                            Minify Images

                                                                                                                            Best Practices Summary

                                                                                                                            Doing lsquolessrsquo is the key to saving costs

                                                                                                                            Measure everything

                                                                                                                            Know your application profile in and out

                                                                                                                            Research Examples in the Cloud

                                                                                                                            hellipon another set of slides

                                                                                                                            Map Reduce on Azure

                                                                                                                            bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                                                                                                            bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                                                                                                            76

                                                                                                                            Project Daytona - Map Reduce on Azure

                                                                                                                            httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                                                                                                            77

                                                                                                                            Questions and Discussionhellip

                                                                                                                            Thank you for hosting me at the Summer School

                                                                                                                            BLAST (Basic Local Alignment Search Tool)

                                                                                                                            bull The most important software in bioinformatics

                                                                                                                            bull Identify similarity between bio-sequences

                                                                                                                            Computationally intensive

                                                                                                                            bull Large number of pairwise alignment operations

                                                                                                                            bull A BLAST running can take 700 ~ 1000 CPU hours

                                                                                                                            bull Sequence databases growing exponentially

                                                                                                                            bull GenBank doubled in size in about 15 months

                                                                                                                            It is easy to parallelize BLAST

                                                                                                                            bull Segment the input

                                                                                                                            bull Segment processing (querying) is pleasingly parallel

                                                                                                                            bull Segment the database (eg mpiBLAST)

                                                                                                                            bull Needs special result reduction processing

                                                                                                                            Large volume data

                                                                                                                            bull A normal Blast database can be as large as 10GB

                                                                                                                            bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                                                                                            bull The output of BLAST is usually 10-100x larger than the input

                                                                                                                            bull Parallel BLAST engine on Azure

                                                                                                                            bull Query-segmentation data-parallel pattern bull split the input sequences

                                                                                                                            bull query partitions in parallel

                                                                                                                            bull merge results together when done

                                                                                                                            bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                                                                                            bull With three special considerations bull Batch job management

                                                                                                                            bull Task parallelism on an elastic Cloud

                                                                                                                            Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                                                                                            on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                                                                                            A simple SplitJoin pattern

                                                                                                                            Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                                                                                            bull 1248 for small middle large and extra large instance size

                                                                                                                            Task granularity bull Large partition load imbalance

                                                                                                                            bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                                                                                            bull Data transferring overhead

                                                                                                                            Best Practice test runs to profiling and set size to mitigate the overhead

                                                                                                                            Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                                                                                            bull too small repeated computation

                                                                                                                            bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                                                                                            bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                                                                                            bull Watch out for the 2-hour maximum limitation

                                                                                                                            BLAST task

                                                                                                                            Splitting task

                                                                                                                            BLAST task

                                                                                                                            BLAST task

                                                                                                                            BLAST task

                                                                                                                            hellip

                                                                                                                            Merging Task

                                                                                                                            Task size vs Performance

                                                                                                                            bull Benefit of the warm cache effect

                                                                                                                            bull 100 sequences per partition is the best choice

                                                                                                                            Instance size vs Performance

                                                                                                                            bull Super-linear speedup with larger size worker instances

                                                                                                                            bull Primarily due to the memory capability

                                                                                                                            Task SizeInstance Size vs Cost

                                                                                                                            bull Extra-large instance generated the best and the most economical throughput

                                                                                                                            bull Fully utilize the resource

                                                                                                                            Web

                                                                                                                            Portal

                                                                                                                            Web

                                                                                                                            Service

                                                                                                                            Job registration

                                                                                                                            Job Scheduler

                                                                                                                            Worker

                                                                                                                            Worker

                                                                                                                            Worker

                                                                                                                            Global

                                                                                                                            dispatch

                                                                                                                            queue

                                                                                                                            Web Role

                                                                                                                            Azure Table

                                                                                                                            Job Management Role

                                                                                                                            Azure Blob

                                                                                                                            Database

                                                                                                                            updating Role

                                                                                                                            hellip

                                                                                                                            Scaling Engine

                                                                                                                            Blast databases

                                                                                                                            temporary data etc)

                                                                                                                            Job Registry NCBI databases

                                                                                                                            BLAST task

                                                                                                                            Splitting task

                                                                                                                            BLAST task

                                                                                                                            BLAST task

                                                                                                                            BLAST task

                                                                                                                            hellip

                                                                                                                            Merging Task

                                                                                                                            ASPNET program hosted by a web role instance bull Submit jobs

                                                                                                                            bull Track jobrsquos status and logs

                                                                                                                            AuthenticationAuthorization based on Live ID

                                                                                                                            The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                                                                                            Web Portal

                                                                                                                            Web Service

                                                                                                                            Job registration

                                                                                                                            Job Scheduler

                                                                                                                            Job Portal

                                                                                                                            Scaling Engine

                                                                                                                            Job Registry

                                                                                                                            R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                                                                                            Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                                                                                            bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                                                                                            AzureBLAST significantly saved computing timehellip

                                                                                                                            Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                                                                                            ldquoAll against Allrdquo query bull The database is also the input query

                                                                                                                            bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                                                                                            bull Theoretically 100 billion sequence comparisons

                                                                                                                            Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                                                                                            bull Would require 3216731 minutes (61 years) on one desktop

                                                                                                                            One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                                                                                            bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                                                                            bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                                                                            bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                                                                            bull Each segment consists of smaller partitions

                                                                                                                            bull When load imbalances redistribute the load manually

                                                                                                                            5

                                                                                                                            0

                                                                                                                            62 6

                                                                                                                            2 6

                                                                                                                            2 6

                                                                                                                            2 6

                                                                                                                            2 5

                                                                                                                            0 62

                                                                                                                            bull Total size of the output result is ~230GB

                                                                                                                            bull The number of total hits is 1764579487

                                                                                                                            bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                                                                            bull But based our estimates real working instance time should be 6~8 day

                                                                                                                            bull Look into log data to analyze what took placehellip

                                                                                                                            5

                                                                                                                            0

                                                                                                                            62 6

                                                                                                                            2 6

                                                                                                                            2 6

                                                                                                                            2 6

                                                                                                                            2 5

                                                                                                                            0 62

                                                                                                                            A normal log record should be

                                                                                                                            Otherwise something is wrong (eg task failed to complete)

                                                                                                                            3312010 614 RD00155D3611B0 Executing the task 251523

                                                                                                                            3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                                                                            3312010 625 RD00155D3611B0 Executing the task 251553

                                                                                                                            3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                                                                            3312010 644 RD00155D3611B0 Executing the task 251600

                                                                                                                            3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                                                                            3312010 822 RD00155D3611B0 Executing the task 251774

                                                                                                                            3312010 950 RD00155D3611B0 Executing the task 251895

                                                                                                                            3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                                                                            North Europe Data Center totally 34256 tasks processed

                                                                                                                            All 62 compute nodes lost tasks

                                                                                                                            and then came back in a group

                                                                                                                            This is an Update domain

                                                                                                                            ~30 mins

                                                                                                                            ~ 6 nodes in one group

                                                                                                                            35 Nodes experience blob

                                                                                                                            writing failure at same

                                                                                                                            time

                                                                                                                            West Europe Datacenter 30976 tasks are completed and job was killed

                                                                                                                            A reasonable guess the

                                                                                                                            Fault Domain is working

                                                                                                                            MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                                                                            You never miss the water till the well has run dry Irish Proverb

                                                                                                                            ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                                                                            Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                                                                            λv = Latent heat of vaporization (Jg)

                                                                                                                            Rn = Net radiation (W m-2)

                                                                                                                            cp = Specific heat capacity of air (J kg-1 K-1)

                                                                                                                            ρa = dry air density (kg m-3)

                                                                                                                            δq = vapor pressure deficit (Pa)

                                                                                                                            ga = Conductivity of air (inverse of ra) (m s-1)

                                                                                                                            gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                                                                            γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                                                                            Estimating resistanceconductivity across a

                                                                                                                            catchment can be tricky

                                                                                                                            bull Lots of inputs big data reduction

                                                                                                                            bull Some of the inputs are not so simple

                                                                                                                            119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                                                                            (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                                                                            Penman-Monteith (1964)

                                                                                                                            Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                                                                            NASA MODIS

                                                                                                                            imagery source

                                                                                                                            archives

                                                                                                                            5 TB (600K files)

                                                                                                                            FLUXNET curated

                                                                                                                            sensor dataset

                                                                                                                            (30GB 960 files)

                                                                                                                            FLUXNET curated

                                                                                                                            field dataset

                                                                                                                            2 KB (1 file)

                                                                                                                            NCEPNCAR

                                                                                                                            ~100MB

                                                                                                                            (4K files)

                                                                                                                            Vegetative

                                                                                                                            clumping

                                                                                                                            ~5MB (1file)

                                                                                                                            Climate

                                                                                                                            classification

                                                                                                                            ~1MB (1file)

                                                                                                                            20 US year = 1 global year

                                                                                                                            Data collection (map) stage

                                                                                                                            bull Downloads requested input tiles

                                                                                                                            from NASA ftp sites

                                                                                                                            bull Includes geospatial lookup for

                                                                                                                            non-sinusoidal tiles that will

                                                                                                                            contribute to a reprojected

                                                                                                                            sinusoidal tile

                                                                                                                            Reprojection (map) stage

                                                                                                                            bull Converts source tile(s) to

                                                                                                                            intermediate result sinusoidal tiles

                                                                                                                            bull Simple nearest neighbor or spline

                                                                                                                            algorithms

                                                                                                                            Derivation reduction stage

                                                                                                                            bull First stage visible to scientist

                                                                                                                            bull Computes ET in our initial use

                                                                                                                            Analysis reduction stage

                                                                                                                            bull Optional second stage visible to

                                                                                                                            scientist

                                                                                                                            bull Enables production of science

                                                                                                                            analysis artifacts such as maps

                                                                                                                            tables virtual sensors

                                                                                                                            Reduction 1

                                                                                                                            Queue

                                                                                                                            Source

                                                                                                                            Metadata

                                                                                                                            AzureMODIS

                                                                                                                            Service Web Role Portal

                                                                                                                            Request

                                                                                                                            Queue

                                                                                                                            Scientific

                                                                                                                            Results

                                                                                                                            Download

                                                                                                                            Data Collection Stage

                                                                                                                            Source Imagery Download Sites

                                                                                                                            Reprojection

                                                                                                                            Queue

                                                                                                                            Reduction 2

                                                                                                                            Queue

                                                                                                                            Download

                                                                                                                            Queue

                                                                                                                            Scientists

                                                                                                                            Science results

                                                                                                                            Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                            httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                                                            bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                                                            bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                                                            bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                                                            recoverable units of work

                                                                                                                            bull Execution status of all jobs and tasks persisted in Tables

                                                                                                                            ltPipelineStagegt

                                                                                                                            Request

                                                                                                                            hellip ltPipelineStagegtJobStatus

                                                                                                                            Persist ltPipelineStagegtJob Queue

                                                                                                                            MODISAzure Service

                                                                                                                            (Web Role)

                                                                                                                            Service Monitor

                                                                                                                            (Worker Role)

                                                                                                                            Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                            hellip

                                                                                                                            Dispatch

                                                                                                                            ltPipelineStagegtTask Queue

                                                                                                                            All work actually done by a Worker Role

                                                                                                                            bull Sandboxes science or other executable

                                                                                                                            bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                                                            Service Monitor

                                                                                                                            (Worker Role)

                                                                                                                            Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                            GenericWorker

                                                                                                                            (Worker Role)

                                                                                                                            hellip

                                                                                                                            hellip

                                                                                                                            Dispatch

                                                                                                                            ltPipelineStagegtTask Queue

                                                                                                                            hellip

                                                                                                                            ltInputgtData Storage

                                                                                                                            bull Dequeues tasks created by the Service Monitor

                                                                                                                            bull Retries failed tasks 3 times

                                                                                                                            bull Maintains all task status

                                                                                                                            Reprojection Request

                                                                                                                            hellip

                                                                                                                            Service Monitor

                                                                                                                            (Worker Role)

                                                                                                                            ReprojectionJobStatus Persist

                                                                                                                            Parse amp Persist ReprojectionTaskStatus

                                                                                                                            GenericWorker

                                                                                                                            (Worker Role)

                                                                                                                            hellip

                                                                                                                            Job Queue

                                                                                                                            hellip

                                                                                                                            Dispatch

                                                                                                                            Task Queue

                                                                                                                            Points to

                                                                                                                            hellip

                                                                                                                            ScanTimeList

                                                                                                                            SwathGranuleMeta

                                                                                                                            Reprojection Data

                                                                                                                            Storage

                                                                                                                            Each entity specifies a

                                                                                                                            single reprojection job

                                                                                                                            request

                                                                                                                            Each entity specifies a

                                                                                                                            single reprojection task (ie

                                                                                                                            a single tile)

                                                                                                                            Query this table to get

                                                                                                                            geo-metadata (eg

                                                                                                                            boundaries) for each swath

                                                                                                                            tile

                                                                                                                            Query this table to get the

                                                                                                                            list of satellite scan times

                                                                                                                            that cover a target tile

                                                                                                                            Swath Source

                                                                                                                            Data Storage

                                                                                                                            bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                                                            bull Storage costs driven by data scale and 6 month project duration

                                                                                                                            bull Small with respect to the people costs even at graduate student rates

                                                                                                                            Reduction 1 Queue

                                                                                                                            Source Metadata

                                                                                                                            Request Queue

                                                                                                                            Scientific Results Download

                                                                                                                            Data Collection Stage

                                                                                                                            Source Imagery Download Sites

                                                                                                                            Reprojection Queue

                                                                                                                            Reduction 2 Queue

                                                                                                                            Download

                                                                                                                            Queue

                                                                                                                            Scientists

                                                                                                                            Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                            400-500 GB

                                                                                                                            60K files

                                                                                                                            10 MBsec

                                                                                                                            11 hours

                                                                                                                            lt10 workers

                                                                                                                            $50 upload

                                                                                                                            $450 storage

                                                                                                                            400 GB

                                                                                                                            45K files

                                                                                                                            3500 hours

                                                                                                                            20-100

                                                                                                                            workers

                                                                                                                            5-7 GB

                                                                                                                            55K files

                                                                                                                            1800 hours

                                                                                                                            20-100

                                                                                                                            workers

                                                                                                                            lt10 GB

                                                                                                                            ~1K files

                                                                                                                            1800 hours

                                                                                                                            20-100

                                                                                                                            workers

                                                                                                                            $420 cpu

                                                                                                                            $60 download

                                                                                                                            $216 cpu

                                                                                                                            $1 download

                                                                                                                            $6 storage

                                                                                                                            $216 cpu

                                                                                                                            $2 download

                                                                                                                            $9 storage

                                                                                                                            AzureMODIS

                                                                                                                            Service Web Role Portal

                                                                                                                            Total $1420

                                                                                                                            bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                                                            potential to be important to both large and small scale science problems

                                                                                                                            bull Equally import they can increase participation in research providing needed

                                                                                                                            resources to userscommunities without ready access

                                                                                                                            bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                                                            support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                                                            latency applications do not perform optimally on clouds today

                                                                                                                            bull Provide valuable fault tolerance and scalability abstractions

                                                                                                                            bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                                                            bull Clouds services to support research provide considerable leverage for both

                                                                                                                            individual researchers and entire communities of researchers

                                                                                                                            • Day 2 - Azure as a PaaS
                                                                                                                            • Day 2 - Applications

                                                                                                                              63

                                                                                                                              Tools ndash Fiddler2

                                                                                                                              Best Practices

                                                                                                                              Picking the Right VM Size

                                                                                                                              bull Having the correct VM size can make a big difference in costs

                                                                                                                              bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                                                                                                                              bull If you scale better than linear across cores larger VMs could save you money

                                                                                                                              bull Pretty rare to see linear scaling across 8 cores

                                                                                                                              bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                                                                                                                              bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                                                                                                                              Using Your VM to the Maximum

                                                                                                                              Remember bull 1 role instance == 1 VM running Windows

                                                                                                                              bull 1 role instance = one specific task for your code

                                                                                                                              bull Yoursquore paying for the entire VM so why not use it

                                                                                                                              bull Common mistake ndash split up code into multiple roles each not using up CPU

                                                                                                                              bull Balance between using up CPU vs having free capacity in times of need

                                                                                                                              bull Multiple ways to use your CPU to the fullest

                                                                                                                              Exploiting Concurrency

                                                                                                                              bull Spin up additional processes each with a specific task or as a unit of concurrency

                                                                                                                              bull May not be ideal if number of active processes exceeds number of cores

                                                                                                                              bull Use multithreading aggressively

                                                                                                                              bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                                                                                                                              bull In NET 4 use the Task Parallel Library

                                                                                                                              bull Data parallelism

                                                                                                                              bull Task parallelism

                                                                                                                              Finding Good Code Neighbors

                                                                                                                              bull Typically code falls into one or more of these categories

                                                                                                                              bull Find code that is intensive with different resources to live together

                                                                                                                              bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                                                                                                                              Memory Intensive

                                                                                                                              CPU Intensive

                                                                                                                              Network IO Intensive

                                                                                                                              Storage IO Intensive

                                                                                                                              Scaling Appropriately

                                                                                                                              bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                                                                                                                              bull Spinning VMs up and down automatically is good at large scale

                                                                                                                              bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                                                                                                                              bull Being too aggressive in spinning down VMs can result in poor user experience

                                                                                                                              bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                                                                                                                              Performance Cost

                                                                                                                              Storage Costs

                                                                                                                              bull Understand an applicationrsquos storage profile and how storage billing works

                                                                                                                              bull Make service choices based on your app profile

                                                                                                                              bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                                                                                                                              bull Service choice can make a big cost difference based on your app profile

                                                                                                                              bull Caching and compressing They help a lot with storage costs

                                                                                                                              Saving Bandwidth Costs

                                                                                                                              Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                                                                                                              Sending fewer things over the wire often means getting fewer things from storage

                                                                                                                              Saving bandwidth costs often lead to savings in other places

                                                                                                                              Sending fewer things means your VM has time to do other tasks

                                                                                                                              All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                                                                                                              Compressing Content

                                                                                                                              1 Gzip all output content

                                                                                                                              bull All modern browsers can decompress on the fly

                                                                                                                              bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                                                                                                              2Tradeoff compute costs for storage size

                                                                                                                              3Minimize image sizes

                                                                                                                              bull Use Portable Network Graphics (PNGs)

                                                                                                                              bull Crush your PNGs

                                                                                                                              bull Strip needless metadata

                                                                                                                              bull Make all PNGs palette PNGs

                                                                                                                              Uncompressed

                                                                                                                              Content

                                                                                                                              Compressed

                                                                                                                              Content

                                                                                                                              Gzip

                                                                                                                              Minify JavaScript

                                                                                                                              Minify CCS

                                                                                                                              Minify Images

                                                                                                                              Best Practices Summary

                                                                                                                              Doing lsquolessrsquo is the key to saving costs

                                                                                                                              Measure everything

                                                                                                                              Know your application profile in and out

                                                                                                                              Research Examples in the Cloud

                                                                                                                              hellipon another set of slides

                                                                                                                              Map Reduce on Azure

                                                                                                                              bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                                                                                                              bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                                                                                                              76

                                                                                                                              Project Daytona - Map Reduce on Azure

                                                                                                                              httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                                                                                                              77

                                                                                                                              Questions and Discussionhellip

                                                                                                                              Thank you for hosting me at the Summer School

                                                                                                                              BLAST (Basic Local Alignment Search Tool)

                                                                                                                              bull The most important software in bioinformatics

                                                                                                                              bull Identify similarity between bio-sequences

                                                                                                                              Computationally intensive

                                                                                                                              bull Large number of pairwise alignment operations

                                                                                                                              bull A BLAST running can take 700 ~ 1000 CPU hours

                                                                                                                              bull Sequence databases growing exponentially

                                                                                                                              bull GenBank doubled in size in about 15 months

                                                                                                                              It is easy to parallelize BLAST

                                                                                                                              bull Segment the input

                                                                                                                              bull Segment processing (querying) is pleasingly parallel

                                                                                                                              bull Segment the database (eg mpiBLAST)

                                                                                                                              bull Needs special result reduction processing

                                                                                                                              Large volume data

                                                                                                                              bull A normal Blast database can be as large as 10GB

                                                                                                                              bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                                                                                              bull The output of BLAST is usually 10-100x larger than the input

                                                                                                                              bull Parallel BLAST engine on Azure

                                                                                                                              bull Query-segmentation data-parallel pattern bull split the input sequences

                                                                                                                              bull query partitions in parallel

                                                                                                                              bull merge results together when done

                                                                                                                              bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                                                                                              bull With three special considerations bull Batch job management

                                                                                                                              bull Task parallelism on an elastic Cloud

                                                                                                                              Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                                                                                              on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                                                                                              A simple SplitJoin pattern

                                                                                                                              Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                                                                                              bull 1248 for small middle large and extra large instance size

                                                                                                                              Task granularity bull Large partition load imbalance

                                                                                                                              bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                                                                                              bull Data transferring overhead

                                                                                                                              Best Practice test runs to profiling and set size to mitigate the overhead

                                                                                                                              Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                                                                                              bull too small repeated computation

                                                                                                                              bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                                                                                              bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                                                                                              bull Watch out for the 2-hour maximum limitation

                                                                                                                              BLAST task

                                                                                                                              Splitting task

                                                                                                                              BLAST task

                                                                                                                              BLAST task

                                                                                                                              BLAST task

                                                                                                                              hellip

                                                                                                                              Merging Task

                                                                                                                              Task size vs Performance

                                                                                                                              bull Benefit of the warm cache effect

                                                                                                                              bull 100 sequences per partition is the best choice

                                                                                                                              Instance size vs Performance

                                                                                                                              bull Super-linear speedup with larger size worker instances

                                                                                                                              bull Primarily due to the memory capability

                                                                                                                              Task SizeInstance Size vs Cost

                                                                                                                              bull Extra-large instance generated the best and the most economical throughput

                                                                                                                              bull Fully utilize the resource

                                                                                                                              Web

                                                                                                                              Portal

                                                                                                                              Web

                                                                                                                              Service

                                                                                                                              Job registration

                                                                                                                              Job Scheduler

                                                                                                                              Worker

                                                                                                                              Worker

                                                                                                                              Worker

                                                                                                                              Global

                                                                                                                              dispatch

                                                                                                                              queue

                                                                                                                              Web Role

                                                                                                                              Azure Table

                                                                                                                              Job Management Role

                                                                                                                              Azure Blob

                                                                                                                              Database

                                                                                                                              updating Role

                                                                                                                              hellip

                                                                                                                              Scaling Engine

                                                                                                                              Blast databases

                                                                                                                              temporary data etc)

                                                                                                                              Job Registry NCBI databases

                                                                                                                              BLAST task

                                                                                                                              Splitting task

                                                                                                                              BLAST task

                                                                                                                              BLAST task

                                                                                                                              BLAST task

                                                                                                                              hellip

                                                                                                                              Merging Task

                                                                                                                              ASPNET program hosted by a web role instance bull Submit jobs

                                                                                                                              bull Track jobrsquos status and logs

                                                                                                                              AuthenticationAuthorization based on Live ID

                                                                                                                              The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                                                                                              Web Portal

                                                                                                                              Web Service

                                                                                                                              Job registration

                                                                                                                              Job Scheduler

                                                                                                                              Job Portal

                                                                                                                              Scaling Engine

                                                                                                                              Job Registry

                                                                                                                              R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                                                                                              Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                                                                                              bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                                                                                              AzureBLAST significantly saved computing timehellip

                                                                                                                              Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                                                                                              ldquoAll against Allrdquo query bull The database is also the input query

                                                                                                                              bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                                                                                              bull Theoretically 100 billion sequence comparisons

                                                                                                                              Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                                                                                              bull Would require 3216731 minutes (61 years) on one desktop

                                                                                                                              One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                                                                                              bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                                                                              bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                                                                              bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                                                                              bull Each segment consists of smaller partitions

                                                                                                                              bull When load imbalances redistribute the load manually

                                                                                                                              5

                                                                                                                              0

                                                                                                                              62 6

                                                                                                                              2 6

                                                                                                                              2 6

                                                                                                                              2 6

                                                                                                                              2 5

                                                                                                                              0 62

                                                                                                                              bull Total size of the output result is ~230GB

                                                                                                                              bull The number of total hits is 1764579487

                                                                                                                              bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                                                                              bull But based our estimates real working instance time should be 6~8 day

                                                                                                                              bull Look into log data to analyze what took placehellip

                                                                                                                              5

                                                                                                                              0

                                                                                                                              62 6

                                                                                                                              2 6

                                                                                                                              2 6

                                                                                                                              2 6

                                                                                                                              2 5

                                                                                                                              0 62

                                                                                                                              A normal log record should be

                                                                                                                              Otherwise something is wrong (eg task failed to complete)

                                                                                                                              3312010 614 RD00155D3611B0 Executing the task 251523

                                                                                                                              3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                                                                              3312010 625 RD00155D3611B0 Executing the task 251553

                                                                                                                              3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                                                                              3312010 644 RD00155D3611B0 Executing the task 251600

                                                                                                                              3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                                                                              3312010 822 RD00155D3611B0 Executing the task 251774

                                                                                                                              3312010 950 RD00155D3611B0 Executing the task 251895

                                                                                                                              3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                                                                              North Europe Data Center totally 34256 tasks processed

                                                                                                                              All 62 compute nodes lost tasks

                                                                                                                              and then came back in a group

                                                                                                                              This is an Update domain

                                                                                                                              ~30 mins

                                                                                                                              ~ 6 nodes in one group

                                                                                                                              35 Nodes experience blob

                                                                                                                              writing failure at same

                                                                                                                              time

                                                                                                                              West Europe Datacenter 30976 tasks are completed and job was killed

                                                                                                                              A reasonable guess the

                                                                                                                              Fault Domain is working

                                                                                                                              MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                                                                              You never miss the water till the well has run dry Irish Proverb

                                                                                                                              ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                                                                              Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                                                                              λv = Latent heat of vaporization (Jg)

                                                                                                                              Rn = Net radiation (W m-2)

                                                                                                                              cp = Specific heat capacity of air (J kg-1 K-1)

                                                                                                                              ρa = dry air density (kg m-3)

                                                                                                                              δq = vapor pressure deficit (Pa)

                                                                                                                              ga = Conductivity of air (inverse of ra) (m s-1)

                                                                                                                              gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                                                                              γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                                                                              Estimating resistanceconductivity across a

                                                                                                                              catchment can be tricky

                                                                                                                              bull Lots of inputs big data reduction

                                                                                                                              bull Some of the inputs are not so simple

                                                                                                                              119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                                                                              (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                                                                              Penman-Monteith (1964)

                                                                                                                              Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                                                                              NASA MODIS

                                                                                                                              imagery source

                                                                                                                              archives

                                                                                                                              5 TB (600K files)

                                                                                                                              FLUXNET curated

                                                                                                                              sensor dataset

                                                                                                                              (30GB 960 files)

                                                                                                                              FLUXNET curated

                                                                                                                              field dataset

                                                                                                                              2 KB (1 file)

                                                                                                                              NCEPNCAR

                                                                                                                              ~100MB

                                                                                                                              (4K files)

                                                                                                                              Vegetative

                                                                                                                              clumping

                                                                                                                              ~5MB (1file)

                                                                                                                              Climate

                                                                                                                              classification

                                                                                                                              ~1MB (1file)

                                                                                                                              20 US year = 1 global year

                                                                                                                              Data collection (map) stage

                                                                                                                              bull Downloads requested input tiles

                                                                                                                              from NASA ftp sites

                                                                                                                              bull Includes geospatial lookup for

                                                                                                                              non-sinusoidal tiles that will

                                                                                                                              contribute to a reprojected

                                                                                                                              sinusoidal tile

                                                                                                                              Reprojection (map) stage

                                                                                                                              bull Converts source tile(s) to

                                                                                                                              intermediate result sinusoidal tiles

                                                                                                                              bull Simple nearest neighbor or spline

                                                                                                                              algorithms

                                                                                                                              Derivation reduction stage

                                                                                                                              bull First stage visible to scientist

                                                                                                                              bull Computes ET in our initial use

                                                                                                                              Analysis reduction stage

                                                                                                                              bull Optional second stage visible to

                                                                                                                              scientist

                                                                                                                              bull Enables production of science

                                                                                                                              analysis artifacts such as maps

                                                                                                                              tables virtual sensors

                                                                                                                              Reduction 1

                                                                                                                              Queue

                                                                                                                              Source

                                                                                                                              Metadata

                                                                                                                              AzureMODIS

                                                                                                                              Service Web Role Portal

                                                                                                                              Request

                                                                                                                              Queue

                                                                                                                              Scientific

                                                                                                                              Results

                                                                                                                              Download

                                                                                                                              Data Collection Stage

                                                                                                                              Source Imagery Download Sites

                                                                                                                              Reprojection

                                                                                                                              Queue

                                                                                                                              Reduction 2

                                                                                                                              Queue

                                                                                                                              Download

                                                                                                                              Queue

                                                                                                                              Scientists

                                                                                                                              Science results

                                                                                                                              Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                              httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                                                              bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                                                              bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                                                              bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                                                              recoverable units of work

                                                                                                                              bull Execution status of all jobs and tasks persisted in Tables

                                                                                                                              ltPipelineStagegt

                                                                                                                              Request

                                                                                                                              hellip ltPipelineStagegtJobStatus

                                                                                                                              Persist ltPipelineStagegtJob Queue

                                                                                                                              MODISAzure Service

                                                                                                                              (Web Role)

                                                                                                                              Service Monitor

                                                                                                                              (Worker Role)

                                                                                                                              Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                              hellip

                                                                                                                              Dispatch

                                                                                                                              ltPipelineStagegtTask Queue

                                                                                                                              All work actually done by a Worker Role

                                                                                                                              bull Sandboxes science or other executable

                                                                                                                              bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                                                              Service Monitor

                                                                                                                              (Worker Role)

                                                                                                                              Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                              GenericWorker

                                                                                                                              (Worker Role)

                                                                                                                              hellip

                                                                                                                              hellip

                                                                                                                              Dispatch

                                                                                                                              ltPipelineStagegtTask Queue

                                                                                                                              hellip

                                                                                                                              ltInputgtData Storage

                                                                                                                              bull Dequeues tasks created by the Service Monitor

                                                                                                                              bull Retries failed tasks 3 times

                                                                                                                              bull Maintains all task status

                                                                                                                              Reprojection Request

                                                                                                                              hellip

                                                                                                                              Service Monitor

                                                                                                                              (Worker Role)

                                                                                                                              ReprojectionJobStatus Persist

                                                                                                                              Parse amp Persist ReprojectionTaskStatus

                                                                                                                              GenericWorker

                                                                                                                              (Worker Role)

                                                                                                                              hellip

                                                                                                                              Job Queue

                                                                                                                              hellip

                                                                                                                              Dispatch

                                                                                                                              Task Queue

                                                                                                                              Points to

                                                                                                                              hellip

                                                                                                                              ScanTimeList

                                                                                                                              SwathGranuleMeta

                                                                                                                              Reprojection Data

                                                                                                                              Storage

                                                                                                                              Each entity specifies a

                                                                                                                              single reprojection job

                                                                                                                              request

                                                                                                                              Each entity specifies a

                                                                                                                              single reprojection task (ie

                                                                                                                              a single tile)

                                                                                                                              Query this table to get

                                                                                                                              geo-metadata (eg

                                                                                                                              boundaries) for each swath

                                                                                                                              tile

                                                                                                                              Query this table to get the

                                                                                                                              list of satellite scan times

                                                                                                                              that cover a target tile

                                                                                                                              Swath Source

                                                                                                                              Data Storage

                                                                                                                              bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                                                              bull Storage costs driven by data scale and 6 month project duration

                                                                                                                              bull Small with respect to the people costs even at graduate student rates

                                                                                                                              Reduction 1 Queue

                                                                                                                              Source Metadata

                                                                                                                              Request Queue

                                                                                                                              Scientific Results Download

                                                                                                                              Data Collection Stage

                                                                                                                              Source Imagery Download Sites

                                                                                                                              Reprojection Queue

                                                                                                                              Reduction 2 Queue

                                                                                                                              Download

                                                                                                                              Queue

                                                                                                                              Scientists

                                                                                                                              Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                              400-500 GB

                                                                                                                              60K files

                                                                                                                              10 MBsec

                                                                                                                              11 hours

                                                                                                                              lt10 workers

                                                                                                                              $50 upload

                                                                                                                              $450 storage

                                                                                                                              400 GB

                                                                                                                              45K files

                                                                                                                              3500 hours

                                                                                                                              20-100

                                                                                                                              workers

                                                                                                                              5-7 GB

                                                                                                                              55K files

                                                                                                                              1800 hours

                                                                                                                              20-100

                                                                                                                              workers

                                                                                                                              lt10 GB

                                                                                                                              ~1K files

                                                                                                                              1800 hours

                                                                                                                              20-100

                                                                                                                              workers

                                                                                                                              $420 cpu

                                                                                                                              $60 download

                                                                                                                              $216 cpu

                                                                                                                              $1 download

                                                                                                                              $6 storage

                                                                                                                              $216 cpu

                                                                                                                              $2 download

                                                                                                                              $9 storage

                                                                                                                              AzureMODIS

                                                                                                                              Service Web Role Portal

                                                                                                                              Total $1420

                                                                                                                              bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                                                              potential to be important to both large and small scale science problems

                                                                                                                              bull Equally import they can increase participation in research providing needed

                                                                                                                              resources to userscommunities without ready access

                                                                                                                              bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                                                              support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                                                              latency applications do not perform optimally on clouds today

                                                                                                                              bull Provide valuable fault tolerance and scalability abstractions

                                                                                                                              bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                                                              bull Clouds services to support research provide considerable leverage for both

                                                                                                                              individual researchers and entire communities of researchers

                                                                                                                              • Day 2 - Azure as a PaaS
                                                                                                                              • Day 2 - Applications

                                                                                                                                Best Practices

                                                                                                                                Picking the Right VM Size

                                                                                                                                bull Having the correct VM size can make a big difference in costs

                                                                                                                                bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                                                                                                                                bull If you scale better than linear across cores larger VMs could save you money

                                                                                                                                bull Pretty rare to see linear scaling across 8 cores

                                                                                                                                bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                                                                                                                                bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                                                                                                                                Using Your VM to the Maximum

                                                                                                                                Remember bull 1 role instance == 1 VM running Windows

                                                                                                                                bull 1 role instance = one specific task for your code

                                                                                                                                bull Yoursquore paying for the entire VM so why not use it

                                                                                                                                bull Common mistake ndash split up code into multiple roles each not using up CPU

                                                                                                                                bull Balance between using up CPU vs having free capacity in times of need

                                                                                                                                bull Multiple ways to use your CPU to the fullest

                                                                                                                                Exploiting Concurrency

                                                                                                                                bull Spin up additional processes each with a specific task or as a unit of concurrency

                                                                                                                                bull May not be ideal if number of active processes exceeds number of cores

                                                                                                                                bull Use multithreading aggressively

                                                                                                                                bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                                                                                                                                bull In NET 4 use the Task Parallel Library

                                                                                                                                bull Data parallelism

                                                                                                                                bull Task parallelism

                                                                                                                                Finding Good Code Neighbors

                                                                                                                                bull Typically code falls into one or more of these categories

                                                                                                                                bull Find code that is intensive with different resources to live together

                                                                                                                                bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                                                                                                                                Memory Intensive

                                                                                                                                CPU Intensive

                                                                                                                                Network IO Intensive

                                                                                                                                Storage IO Intensive

                                                                                                                                Scaling Appropriately

                                                                                                                                bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                                                                                                                                bull Spinning VMs up and down automatically is good at large scale

                                                                                                                                bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                                                                                                                                bull Being too aggressive in spinning down VMs can result in poor user experience

                                                                                                                                bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                                                                                                                                Performance Cost

                                                                                                                                Storage Costs

                                                                                                                                bull Understand an applicationrsquos storage profile and how storage billing works

                                                                                                                                bull Make service choices based on your app profile

                                                                                                                                bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                                                                                                                                bull Service choice can make a big cost difference based on your app profile

                                                                                                                                bull Caching and compressing They help a lot with storage costs

                                                                                                                                Saving Bandwidth Costs

                                                                                                                                Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                                                                                                                Sending fewer things over the wire often means getting fewer things from storage

                                                                                                                                Saving bandwidth costs often lead to savings in other places

                                                                                                                                Sending fewer things means your VM has time to do other tasks

                                                                                                                                All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                                                                                                                Compressing Content

                                                                                                                                1 Gzip all output content

                                                                                                                                bull All modern browsers can decompress on the fly

                                                                                                                                bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                                                                                                                2Tradeoff compute costs for storage size

                                                                                                                                3Minimize image sizes

                                                                                                                                bull Use Portable Network Graphics (PNGs)

                                                                                                                                bull Crush your PNGs

                                                                                                                                bull Strip needless metadata

                                                                                                                                bull Make all PNGs palette PNGs

                                                                                                                                Uncompressed

                                                                                                                                Content

                                                                                                                                Compressed

                                                                                                                                Content

                                                                                                                                Gzip

                                                                                                                                Minify JavaScript

                                                                                                                                Minify CCS

                                                                                                                                Minify Images

                                                                                                                                Best Practices Summary

                                                                                                                                Doing lsquolessrsquo is the key to saving costs

                                                                                                                                Measure everything

                                                                                                                                Know your application profile in and out

                                                                                                                                Research Examples in the Cloud

                                                                                                                                hellipon another set of slides

                                                                                                                                Map Reduce on Azure

                                                                                                                                bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                                                                                                                bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                                                                                                                76

                                                                                                                                Project Daytona - Map Reduce on Azure

                                                                                                                                httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                                                                                                                77

                                                                                                                                Questions and Discussionhellip

                                                                                                                                Thank you for hosting me at the Summer School

                                                                                                                                BLAST (Basic Local Alignment Search Tool)

                                                                                                                                bull The most important software in bioinformatics

                                                                                                                                bull Identify similarity between bio-sequences

                                                                                                                                Computationally intensive

                                                                                                                                bull Large number of pairwise alignment operations

                                                                                                                                bull A BLAST running can take 700 ~ 1000 CPU hours

                                                                                                                                bull Sequence databases growing exponentially

                                                                                                                                bull GenBank doubled in size in about 15 months

                                                                                                                                It is easy to parallelize BLAST

                                                                                                                                bull Segment the input

                                                                                                                                bull Segment processing (querying) is pleasingly parallel

                                                                                                                                bull Segment the database (eg mpiBLAST)

                                                                                                                                bull Needs special result reduction processing

                                                                                                                                Large volume data

                                                                                                                                bull A normal Blast database can be as large as 10GB

                                                                                                                                bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                                                                                                bull The output of BLAST is usually 10-100x larger than the input

                                                                                                                                bull Parallel BLAST engine on Azure

                                                                                                                                bull Query-segmentation data-parallel pattern bull split the input sequences

                                                                                                                                bull query partitions in parallel

                                                                                                                                bull merge results together when done

                                                                                                                                bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                                                                                                bull With three special considerations bull Batch job management

                                                                                                                                bull Task parallelism on an elastic Cloud

                                                                                                                                Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                                                                                                on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                                                                                                A simple SplitJoin pattern

                                                                                                                                Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                                                                                                bull 1248 for small middle large and extra large instance size

                                                                                                                                Task granularity bull Large partition load imbalance

                                                                                                                                bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                                                                                                bull Data transferring overhead

                                                                                                                                Best Practice test runs to profiling and set size to mitigate the overhead

                                                                                                                                Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                                                                                                bull too small repeated computation

                                                                                                                                bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                                                                                                bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                                                                                                bull Watch out for the 2-hour maximum limitation

                                                                                                                                BLAST task

                                                                                                                                Splitting task

                                                                                                                                BLAST task

                                                                                                                                BLAST task

                                                                                                                                BLAST task

                                                                                                                                hellip

                                                                                                                                Merging Task

                                                                                                                                Task size vs Performance

                                                                                                                                bull Benefit of the warm cache effect

                                                                                                                                bull 100 sequences per partition is the best choice

                                                                                                                                Instance size vs Performance

                                                                                                                                bull Super-linear speedup with larger size worker instances

                                                                                                                                bull Primarily due to the memory capability

                                                                                                                                Task SizeInstance Size vs Cost

                                                                                                                                bull Extra-large instance generated the best and the most economical throughput

                                                                                                                                bull Fully utilize the resource

                                                                                                                                Web

                                                                                                                                Portal

                                                                                                                                Web

                                                                                                                                Service

                                                                                                                                Job registration

                                                                                                                                Job Scheduler

                                                                                                                                Worker

                                                                                                                                Worker

                                                                                                                                Worker

                                                                                                                                Global

                                                                                                                                dispatch

                                                                                                                                queue

                                                                                                                                Web Role

                                                                                                                                Azure Table

                                                                                                                                Job Management Role

                                                                                                                                Azure Blob

                                                                                                                                Database

                                                                                                                                updating Role

                                                                                                                                hellip

                                                                                                                                Scaling Engine

                                                                                                                                Blast databases

                                                                                                                                temporary data etc)

                                                                                                                                Job Registry NCBI databases

                                                                                                                                BLAST task

                                                                                                                                Splitting task

                                                                                                                                BLAST task

                                                                                                                                BLAST task

                                                                                                                                BLAST task

                                                                                                                                hellip

                                                                                                                                Merging Task

                                                                                                                                ASPNET program hosted by a web role instance bull Submit jobs

                                                                                                                                bull Track jobrsquos status and logs

                                                                                                                                AuthenticationAuthorization based on Live ID

                                                                                                                                The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                                                                                                Web Portal

                                                                                                                                Web Service

                                                                                                                                Job registration

                                                                                                                                Job Scheduler

                                                                                                                                Job Portal

                                                                                                                                Scaling Engine

                                                                                                                                Job Registry

                                                                                                                                R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                                                                                                Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                                                                                                bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                                                                                                AzureBLAST significantly saved computing timehellip

                                                                                                                                Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                                                                                                ldquoAll against Allrdquo query bull The database is also the input query

                                                                                                                                bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                                                                                                bull Theoretically 100 billion sequence comparisons

                                                                                                                                Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                                                                                                bull Would require 3216731 minutes (61 years) on one desktop

                                                                                                                                One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                                                                                                bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                                                                                bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                                                                                bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                                                                                bull Each segment consists of smaller partitions

                                                                                                                                bull When load imbalances redistribute the load manually

                                                                                                                                5

                                                                                                                                0

                                                                                                                                62 6

                                                                                                                                2 6

                                                                                                                                2 6

                                                                                                                                2 6

                                                                                                                                2 5

                                                                                                                                0 62

                                                                                                                                bull Total size of the output result is ~230GB

                                                                                                                                bull The number of total hits is 1764579487

                                                                                                                                bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                                                                                bull But based our estimates real working instance time should be 6~8 day

                                                                                                                                bull Look into log data to analyze what took placehellip

                                                                                                                                5

                                                                                                                                0

                                                                                                                                62 6

                                                                                                                                2 6

                                                                                                                                2 6

                                                                                                                                2 6

                                                                                                                                2 5

                                                                                                                                0 62

                                                                                                                                A normal log record should be

                                                                                                                                Otherwise something is wrong (eg task failed to complete)

                                                                                                                                3312010 614 RD00155D3611B0 Executing the task 251523

                                                                                                                                3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                                                                                3312010 625 RD00155D3611B0 Executing the task 251553

                                                                                                                                3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                                                                                3312010 644 RD00155D3611B0 Executing the task 251600

                                                                                                                                3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                                                                                3312010 822 RD00155D3611B0 Executing the task 251774

                                                                                                                                3312010 950 RD00155D3611B0 Executing the task 251895

                                                                                                                                3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                                                                                North Europe Data Center totally 34256 tasks processed

                                                                                                                                All 62 compute nodes lost tasks

                                                                                                                                and then came back in a group

                                                                                                                                This is an Update domain

                                                                                                                                ~30 mins

                                                                                                                                ~ 6 nodes in one group

                                                                                                                                35 Nodes experience blob

                                                                                                                                writing failure at same

                                                                                                                                time

                                                                                                                                West Europe Datacenter 30976 tasks are completed and job was killed

                                                                                                                                A reasonable guess the

                                                                                                                                Fault Domain is working

                                                                                                                                MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                                                                                You never miss the water till the well has run dry Irish Proverb

                                                                                                                                ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                                                                                Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                                                                                λv = Latent heat of vaporization (Jg)

                                                                                                                                Rn = Net radiation (W m-2)

                                                                                                                                cp = Specific heat capacity of air (J kg-1 K-1)

                                                                                                                                ρa = dry air density (kg m-3)

                                                                                                                                δq = vapor pressure deficit (Pa)

                                                                                                                                ga = Conductivity of air (inverse of ra) (m s-1)

                                                                                                                                gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                                                                                γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                                                                                Estimating resistanceconductivity across a

                                                                                                                                catchment can be tricky

                                                                                                                                bull Lots of inputs big data reduction

                                                                                                                                bull Some of the inputs are not so simple

                                                                                                                                119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                                                                                (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                                                                                Penman-Monteith (1964)

                                                                                                                                Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                                                                                NASA MODIS

                                                                                                                                imagery source

                                                                                                                                archives

                                                                                                                                5 TB (600K files)

                                                                                                                                FLUXNET curated

                                                                                                                                sensor dataset

                                                                                                                                (30GB 960 files)

                                                                                                                                FLUXNET curated

                                                                                                                                field dataset

                                                                                                                                2 KB (1 file)

                                                                                                                                NCEPNCAR

                                                                                                                                ~100MB

                                                                                                                                (4K files)

                                                                                                                                Vegetative

                                                                                                                                clumping

                                                                                                                                ~5MB (1file)

                                                                                                                                Climate

                                                                                                                                classification

                                                                                                                                ~1MB (1file)

                                                                                                                                20 US year = 1 global year

                                                                                                                                Data collection (map) stage

                                                                                                                                bull Downloads requested input tiles

                                                                                                                                from NASA ftp sites

                                                                                                                                bull Includes geospatial lookup for

                                                                                                                                non-sinusoidal tiles that will

                                                                                                                                contribute to a reprojected

                                                                                                                                sinusoidal tile

                                                                                                                                Reprojection (map) stage

                                                                                                                                bull Converts source tile(s) to

                                                                                                                                intermediate result sinusoidal tiles

                                                                                                                                bull Simple nearest neighbor or spline

                                                                                                                                algorithms

                                                                                                                                Derivation reduction stage

                                                                                                                                bull First stage visible to scientist

                                                                                                                                bull Computes ET in our initial use

                                                                                                                                Analysis reduction stage

                                                                                                                                bull Optional second stage visible to

                                                                                                                                scientist

                                                                                                                                bull Enables production of science

                                                                                                                                analysis artifacts such as maps

                                                                                                                                tables virtual sensors

                                                                                                                                Reduction 1

                                                                                                                                Queue

                                                                                                                                Source

                                                                                                                                Metadata

                                                                                                                                AzureMODIS

                                                                                                                                Service Web Role Portal

                                                                                                                                Request

                                                                                                                                Queue

                                                                                                                                Scientific

                                                                                                                                Results

                                                                                                                                Download

                                                                                                                                Data Collection Stage

                                                                                                                                Source Imagery Download Sites

                                                                                                                                Reprojection

                                                                                                                                Queue

                                                                                                                                Reduction 2

                                                                                                                                Queue

                                                                                                                                Download

                                                                                                                                Queue

                                                                                                                                Scientists

                                                                                                                                Science results

                                                                                                                                Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                                                                bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                                                                bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                                                                bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                                                                recoverable units of work

                                                                                                                                bull Execution status of all jobs and tasks persisted in Tables

                                                                                                                                ltPipelineStagegt

                                                                                                                                Request

                                                                                                                                hellip ltPipelineStagegtJobStatus

                                                                                                                                Persist ltPipelineStagegtJob Queue

                                                                                                                                MODISAzure Service

                                                                                                                                (Web Role)

                                                                                                                                Service Monitor

                                                                                                                                (Worker Role)

                                                                                                                                Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                hellip

                                                                                                                                Dispatch

                                                                                                                                ltPipelineStagegtTask Queue

                                                                                                                                All work actually done by a Worker Role

                                                                                                                                bull Sandboxes science or other executable

                                                                                                                                bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                                                                Service Monitor

                                                                                                                                (Worker Role)

                                                                                                                                Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                GenericWorker

                                                                                                                                (Worker Role)

                                                                                                                                hellip

                                                                                                                                hellip

                                                                                                                                Dispatch

                                                                                                                                ltPipelineStagegtTask Queue

                                                                                                                                hellip

                                                                                                                                ltInputgtData Storage

                                                                                                                                bull Dequeues tasks created by the Service Monitor

                                                                                                                                bull Retries failed tasks 3 times

                                                                                                                                bull Maintains all task status

                                                                                                                                Reprojection Request

                                                                                                                                hellip

                                                                                                                                Service Monitor

                                                                                                                                (Worker Role)

                                                                                                                                ReprojectionJobStatus Persist

                                                                                                                                Parse amp Persist ReprojectionTaskStatus

                                                                                                                                GenericWorker

                                                                                                                                (Worker Role)

                                                                                                                                hellip

                                                                                                                                Job Queue

                                                                                                                                hellip

                                                                                                                                Dispatch

                                                                                                                                Task Queue

                                                                                                                                Points to

                                                                                                                                hellip

                                                                                                                                ScanTimeList

                                                                                                                                SwathGranuleMeta

                                                                                                                                Reprojection Data

                                                                                                                                Storage

                                                                                                                                Each entity specifies a

                                                                                                                                single reprojection job

                                                                                                                                request

                                                                                                                                Each entity specifies a

                                                                                                                                single reprojection task (ie

                                                                                                                                a single tile)

                                                                                                                                Query this table to get

                                                                                                                                geo-metadata (eg

                                                                                                                                boundaries) for each swath

                                                                                                                                tile

                                                                                                                                Query this table to get the

                                                                                                                                list of satellite scan times

                                                                                                                                that cover a target tile

                                                                                                                                Swath Source

                                                                                                                                Data Storage

                                                                                                                                bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                                                                bull Storage costs driven by data scale and 6 month project duration

                                                                                                                                bull Small with respect to the people costs even at graduate student rates

                                                                                                                                Reduction 1 Queue

                                                                                                                                Source Metadata

                                                                                                                                Request Queue

                                                                                                                                Scientific Results Download

                                                                                                                                Data Collection Stage

                                                                                                                                Source Imagery Download Sites

                                                                                                                                Reprojection Queue

                                                                                                                                Reduction 2 Queue

                                                                                                                                Download

                                                                                                                                Queue

                                                                                                                                Scientists

                                                                                                                                Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                400-500 GB

                                                                                                                                60K files

                                                                                                                                10 MBsec

                                                                                                                                11 hours

                                                                                                                                lt10 workers

                                                                                                                                $50 upload

                                                                                                                                $450 storage

                                                                                                                                400 GB

                                                                                                                                45K files

                                                                                                                                3500 hours

                                                                                                                                20-100

                                                                                                                                workers

                                                                                                                                5-7 GB

                                                                                                                                55K files

                                                                                                                                1800 hours

                                                                                                                                20-100

                                                                                                                                workers

                                                                                                                                lt10 GB

                                                                                                                                ~1K files

                                                                                                                                1800 hours

                                                                                                                                20-100

                                                                                                                                workers

                                                                                                                                $420 cpu

                                                                                                                                $60 download

                                                                                                                                $216 cpu

                                                                                                                                $1 download

                                                                                                                                $6 storage

                                                                                                                                $216 cpu

                                                                                                                                $2 download

                                                                                                                                $9 storage

                                                                                                                                AzureMODIS

                                                                                                                                Service Web Role Portal

                                                                                                                                Total $1420

                                                                                                                                bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                                                                potential to be important to both large and small scale science problems

                                                                                                                                bull Equally import they can increase participation in research providing needed

                                                                                                                                resources to userscommunities without ready access

                                                                                                                                bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                                                                support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                                                                latency applications do not perform optimally on clouds today

                                                                                                                                bull Provide valuable fault tolerance and scalability abstractions

                                                                                                                                bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                                                                bull Clouds services to support research provide considerable leverage for both

                                                                                                                                individual researchers and entire communities of researchers

                                                                                                                                • Day 2 - Azure as a PaaS
                                                                                                                                • Day 2 - Applications

                                                                                                                                  Picking the Right VM Size

                                                                                                                                  bull Having the correct VM size can make a big difference in costs

                                                                                                                                  bull Fundamental choice ndash larger fewer VMs vs many smaller instances

                                                                                                                                  bull If you scale better than linear across cores larger VMs could save you money

                                                                                                                                  bull Pretty rare to see linear scaling across 8 cores

                                                                                                                                  bull More instances may provide better uptime and reliability (more failures needed to take your service down)

                                                                                                                                  bull Only real right answer ndash experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

                                                                                                                                  Using Your VM to the Maximum

                                                                                                                                  Remember bull 1 role instance == 1 VM running Windows

                                                                                                                                  bull 1 role instance = one specific task for your code

                                                                                                                                  bull Yoursquore paying for the entire VM so why not use it

                                                                                                                                  bull Common mistake ndash split up code into multiple roles each not using up CPU

                                                                                                                                  bull Balance between using up CPU vs having free capacity in times of need

                                                                                                                                  bull Multiple ways to use your CPU to the fullest

                                                                                                                                  Exploiting Concurrency

                                                                                                                                  bull Spin up additional processes each with a specific task or as a unit of concurrency

                                                                                                                                  bull May not be ideal if number of active processes exceeds number of cores

                                                                                                                                  bull Use multithreading aggressively

                                                                                                                                  bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                                                                                                                                  bull In NET 4 use the Task Parallel Library

                                                                                                                                  bull Data parallelism

                                                                                                                                  bull Task parallelism

                                                                                                                                  Finding Good Code Neighbors

                                                                                                                                  bull Typically code falls into one or more of these categories

                                                                                                                                  bull Find code that is intensive with different resources to live together

                                                                                                                                  bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                                                                                                                                  Memory Intensive

                                                                                                                                  CPU Intensive

                                                                                                                                  Network IO Intensive

                                                                                                                                  Storage IO Intensive

                                                                                                                                  Scaling Appropriately

                                                                                                                                  bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                                                                                                                                  bull Spinning VMs up and down automatically is good at large scale

                                                                                                                                  bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                                                                                                                                  bull Being too aggressive in spinning down VMs can result in poor user experience

                                                                                                                                  bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                                                                                                                                  Performance Cost

                                                                                                                                  Storage Costs

                                                                                                                                  bull Understand an applicationrsquos storage profile and how storage billing works

                                                                                                                                  bull Make service choices based on your app profile

                                                                                                                                  bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                                                                                                                                  bull Service choice can make a big cost difference based on your app profile

                                                                                                                                  bull Caching and compressing They help a lot with storage costs

                                                                                                                                  Saving Bandwidth Costs

                                                                                                                                  Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                                                                                                                  Sending fewer things over the wire often means getting fewer things from storage

                                                                                                                                  Saving bandwidth costs often lead to savings in other places

                                                                                                                                  Sending fewer things means your VM has time to do other tasks

                                                                                                                                  All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                                                                                                                  Compressing Content

                                                                                                                                  1 Gzip all output content

                                                                                                                                  bull All modern browsers can decompress on the fly

                                                                                                                                  bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                                                                                                                  2Tradeoff compute costs for storage size

                                                                                                                                  3Minimize image sizes

                                                                                                                                  bull Use Portable Network Graphics (PNGs)

                                                                                                                                  bull Crush your PNGs

                                                                                                                                  bull Strip needless metadata

                                                                                                                                  bull Make all PNGs palette PNGs

                                                                                                                                  Uncompressed

                                                                                                                                  Content

                                                                                                                                  Compressed

                                                                                                                                  Content

                                                                                                                                  Gzip

                                                                                                                                  Minify JavaScript

                                                                                                                                  Minify CCS

                                                                                                                                  Minify Images

                                                                                                                                  Best Practices Summary

                                                                                                                                  Doing lsquolessrsquo is the key to saving costs

                                                                                                                                  Measure everything

                                                                                                                                  Know your application profile in and out

                                                                                                                                  Research Examples in the Cloud

                                                                                                                                  hellipon another set of slides

                                                                                                                                  Map Reduce on Azure

                                                                                                                                  bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                                                                                                                  bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                                                                                                                  76

                                                                                                                                  Project Daytona - Map Reduce on Azure

                                                                                                                                  httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                                                                                                                  77

                                                                                                                                  Questions and Discussionhellip

                                                                                                                                  Thank you for hosting me at the Summer School

                                                                                                                                  BLAST (Basic Local Alignment Search Tool)

                                                                                                                                  bull The most important software in bioinformatics

                                                                                                                                  bull Identify similarity between bio-sequences

                                                                                                                                  Computationally intensive

                                                                                                                                  bull Large number of pairwise alignment operations

                                                                                                                                  bull A BLAST running can take 700 ~ 1000 CPU hours

                                                                                                                                  bull Sequence databases growing exponentially

                                                                                                                                  bull GenBank doubled in size in about 15 months

                                                                                                                                  It is easy to parallelize BLAST

                                                                                                                                  bull Segment the input

                                                                                                                                  bull Segment processing (querying) is pleasingly parallel

                                                                                                                                  bull Segment the database (eg mpiBLAST)

                                                                                                                                  bull Needs special result reduction processing

                                                                                                                                  Large volume data

                                                                                                                                  bull A normal Blast database can be as large as 10GB

                                                                                                                                  bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                                                                                                  bull The output of BLAST is usually 10-100x larger than the input

                                                                                                                                  bull Parallel BLAST engine on Azure

                                                                                                                                  bull Query-segmentation data-parallel pattern bull split the input sequences

                                                                                                                                  bull query partitions in parallel

                                                                                                                                  bull merge results together when done

                                                                                                                                  bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                                                                                                  bull With three special considerations bull Batch job management

                                                                                                                                  bull Task parallelism on an elastic Cloud

                                                                                                                                  Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                                                                                                  on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                                                                                                  A simple SplitJoin pattern

                                                                                                                                  Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                                                                                                  bull 1248 for small middle large and extra large instance size

                                                                                                                                  Task granularity bull Large partition load imbalance

                                                                                                                                  bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                                                                                                  bull Data transferring overhead

                                                                                                                                  Best Practice test runs to profiling and set size to mitigate the overhead

                                                                                                                                  Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                                                                                                  bull too small repeated computation

                                                                                                                                  bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                                                                                                  bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                                                                                                  bull Watch out for the 2-hour maximum limitation

                                                                                                                                  BLAST task

                                                                                                                                  Splitting task

                                                                                                                                  BLAST task

                                                                                                                                  BLAST task

                                                                                                                                  BLAST task

                                                                                                                                  hellip

                                                                                                                                  Merging Task

                                                                                                                                  Task size vs Performance

                                                                                                                                  bull Benefit of the warm cache effect

                                                                                                                                  bull 100 sequences per partition is the best choice

                                                                                                                                  Instance size vs Performance

                                                                                                                                  bull Super-linear speedup with larger size worker instances

                                                                                                                                  bull Primarily due to the memory capability

                                                                                                                                  Task SizeInstance Size vs Cost

                                                                                                                                  bull Extra-large instance generated the best and the most economical throughput

                                                                                                                                  bull Fully utilize the resource

                                                                                                                                  Web

                                                                                                                                  Portal

                                                                                                                                  Web

                                                                                                                                  Service

                                                                                                                                  Job registration

                                                                                                                                  Job Scheduler

                                                                                                                                  Worker

                                                                                                                                  Worker

                                                                                                                                  Worker

                                                                                                                                  Global

                                                                                                                                  dispatch

                                                                                                                                  queue

                                                                                                                                  Web Role

                                                                                                                                  Azure Table

                                                                                                                                  Job Management Role

                                                                                                                                  Azure Blob

                                                                                                                                  Database

                                                                                                                                  updating Role

                                                                                                                                  hellip

                                                                                                                                  Scaling Engine

                                                                                                                                  Blast databases

                                                                                                                                  temporary data etc)

                                                                                                                                  Job Registry NCBI databases

                                                                                                                                  BLAST task

                                                                                                                                  Splitting task

                                                                                                                                  BLAST task

                                                                                                                                  BLAST task

                                                                                                                                  BLAST task

                                                                                                                                  hellip

                                                                                                                                  Merging Task

                                                                                                                                  ASPNET program hosted by a web role instance bull Submit jobs

                                                                                                                                  bull Track jobrsquos status and logs

                                                                                                                                  AuthenticationAuthorization based on Live ID

                                                                                                                                  The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                                                                                                  Web Portal

                                                                                                                                  Web Service

                                                                                                                                  Job registration

                                                                                                                                  Job Scheduler

                                                                                                                                  Job Portal

                                                                                                                                  Scaling Engine

                                                                                                                                  Job Registry

                                                                                                                                  R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                                                                                                  Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                                                                                                  bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                                                                                                  AzureBLAST significantly saved computing timehellip

                                                                                                                                  Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                                                                                                  ldquoAll against Allrdquo query bull The database is also the input query

                                                                                                                                  bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                                                                                                  bull Theoretically 100 billion sequence comparisons

                                                                                                                                  Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                                                                                                  bull Would require 3216731 minutes (61 years) on one desktop

                                                                                                                                  One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                                                                                                  bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                                                                                  bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                                                                                  bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                                                                                  bull Each segment consists of smaller partitions

                                                                                                                                  bull When load imbalances redistribute the load manually

                                                                                                                                  5

                                                                                                                                  0

                                                                                                                                  62 6

                                                                                                                                  2 6

                                                                                                                                  2 6

                                                                                                                                  2 6

                                                                                                                                  2 5

                                                                                                                                  0 62

                                                                                                                                  bull Total size of the output result is ~230GB

                                                                                                                                  bull The number of total hits is 1764579487

                                                                                                                                  bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                                                                                  bull But based our estimates real working instance time should be 6~8 day

                                                                                                                                  bull Look into log data to analyze what took placehellip

                                                                                                                                  5

                                                                                                                                  0

                                                                                                                                  62 6

                                                                                                                                  2 6

                                                                                                                                  2 6

                                                                                                                                  2 6

                                                                                                                                  2 5

                                                                                                                                  0 62

                                                                                                                                  A normal log record should be

                                                                                                                                  Otherwise something is wrong (eg task failed to complete)

                                                                                                                                  3312010 614 RD00155D3611B0 Executing the task 251523

                                                                                                                                  3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                                                                                  3312010 625 RD00155D3611B0 Executing the task 251553

                                                                                                                                  3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                                                                                  3312010 644 RD00155D3611B0 Executing the task 251600

                                                                                                                                  3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                                                                                  3312010 822 RD00155D3611B0 Executing the task 251774

                                                                                                                                  3312010 950 RD00155D3611B0 Executing the task 251895

                                                                                                                                  3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                                                                                  North Europe Data Center totally 34256 tasks processed

                                                                                                                                  All 62 compute nodes lost tasks

                                                                                                                                  and then came back in a group

                                                                                                                                  This is an Update domain

                                                                                                                                  ~30 mins

                                                                                                                                  ~ 6 nodes in one group

                                                                                                                                  35 Nodes experience blob

                                                                                                                                  writing failure at same

                                                                                                                                  time

                                                                                                                                  West Europe Datacenter 30976 tasks are completed and job was killed

                                                                                                                                  A reasonable guess the

                                                                                                                                  Fault Domain is working

                                                                                                                                  MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                                                                                  You never miss the water till the well has run dry Irish Proverb

                                                                                                                                  ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                                                                                  Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                                                                                  λv = Latent heat of vaporization (Jg)

                                                                                                                                  Rn = Net radiation (W m-2)

                                                                                                                                  cp = Specific heat capacity of air (J kg-1 K-1)

                                                                                                                                  ρa = dry air density (kg m-3)

                                                                                                                                  δq = vapor pressure deficit (Pa)

                                                                                                                                  ga = Conductivity of air (inverse of ra) (m s-1)

                                                                                                                                  gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                                                                                  γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                                                                                  Estimating resistanceconductivity across a

                                                                                                                                  catchment can be tricky

                                                                                                                                  bull Lots of inputs big data reduction

                                                                                                                                  bull Some of the inputs are not so simple

                                                                                                                                  119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                                                                                  (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                                                                                  Penman-Monteith (1964)

                                                                                                                                  Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                                                                                  NASA MODIS

                                                                                                                                  imagery source

                                                                                                                                  archives

                                                                                                                                  5 TB (600K files)

                                                                                                                                  FLUXNET curated

                                                                                                                                  sensor dataset

                                                                                                                                  (30GB 960 files)

                                                                                                                                  FLUXNET curated

                                                                                                                                  field dataset

                                                                                                                                  2 KB (1 file)

                                                                                                                                  NCEPNCAR

                                                                                                                                  ~100MB

                                                                                                                                  (4K files)

                                                                                                                                  Vegetative

                                                                                                                                  clumping

                                                                                                                                  ~5MB (1file)

                                                                                                                                  Climate

                                                                                                                                  classification

                                                                                                                                  ~1MB (1file)

                                                                                                                                  20 US year = 1 global year

                                                                                                                                  Data collection (map) stage

                                                                                                                                  bull Downloads requested input tiles

                                                                                                                                  from NASA ftp sites

                                                                                                                                  bull Includes geospatial lookup for

                                                                                                                                  non-sinusoidal tiles that will

                                                                                                                                  contribute to a reprojected

                                                                                                                                  sinusoidal tile

                                                                                                                                  Reprojection (map) stage

                                                                                                                                  bull Converts source tile(s) to

                                                                                                                                  intermediate result sinusoidal tiles

                                                                                                                                  bull Simple nearest neighbor or spline

                                                                                                                                  algorithms

                                                                                                                                  Derivation reduction stage

                                                                                                                                  bull First stage visible to scientist

                                                                                                                                  bull Computes ET in our initial use

                                                                                                                                  Analysis reduction stage

                                                                                                                                  bull Optional second stage visible to

                                                                                                                                  scientist

                                                                                                                                  bull Enables production of science

                                                                                                                                  analysis artifacts such as maps

                                                                                                                                  tables virtual sensors

                                                                                                                                  Reduction 1

                                                                                                                                  Queue

                                                                                                                                  Source

                                                                                                                                  Metadata

                                                                                                                                  AzureMODIS

                                                                                                                                  Service Web Role Portal

                                                                                                                                  Request

                                                                                                                                  Queue

                                                                                                                                  Scientific

                                                                                                                                  Results

                                                                                                                                  Download

                                                                                                                                  Data Collection Stage

                                                                                                                                  Source Imagery Download Sites

                                                                                                                                  Reprojection

                                                                                                                                  Queue

                                                                                                                                  Reduction 2

                                                                                                                                  Queue

                                                                                                                                  Download

                                                                                                                                  Queue

                                                                                                                                  Scientists

                                                                                                                                  Science results

                                                                                                                                  Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                  httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                                                                  bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                                                                  bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                                                                  bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                                                                  recoverable units of work

                                                                                                                                  bull Execution status of all jobs and tasks persisted in Tables

                                                                                                                                  ltPipelineStagegt

                                                                                                                                  Request

                                                                                                                                  hellip ltPipelineStagegtJobStatus

                                                                                                                                  Persist ltPipelineStagegtJob Queue

                                                                                                                                  MODISAzure Service

                                                                                                                                  (Web Role)

                                                                                                                                  Service Monitor

                                                                                                                                  (Worker Role)

                                                                                                                                  Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                  hellip

                                                                                                                                  Dispatch

                                                                                                                                  ltPipelineStagegtTask Queue

                                                                                                                                  All work actually done by a Worker Role

                                                                                                                                  bull Sandboxes science or other executable

                                                                                                                                  bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                                                                  Service Monitor

                                                                                                                                  (Worker Role)

                                                                                                                                  Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                  GenericWorker

                                                                                                                                  (Worker Role)

                                                                                                                                  hellip

                                                                                                                                  hellip

                                                                                                                                  Dispatch

                                                                                                                                  ltPipelineStagegtTask Queue

                                                                                                                                  hellip

                                                                                                                                  ltInputgtData Storage

                                                                                                                                  bull Dequeues tasks created by the Service Monitor

                                                                                                                                  bull Retries failed tasks 3 times

                                                                                                                                  bull Maintains all task status

                                                                                                                                  Reprojection Request

                                                                                                                                  hellip

                                                                                                                                  Service Monitor

                                                                                                                                  (Worker Role)

                                                                                                                                  ReprojectionJobStatus Persist

                                                                                                                                  Parse amp Persist ReprojectionTaskStatus

                                                                                                                                  GenericWorker

                                                                                                                                  (Worker Role)

                                                                                                                                  hellip

                                                                                                                                  Job Queue

                                                                                                                                  hellip

                                                                                                                                  Dispatch

                                                                                                                                  Task Queue

                                                                                                                                  Points to

                                                                                                                                  hellip

                                                                                                                                  ScanTimeList

                                                                                                                                  SwathGranuleMeta

                                                                                                                                  Reprojection Data

                                                                                                                                  Storage

                                                                                                                                  Each entity specifies a

                                                                                                                                  single reprojection job

                                                                                                                                  request

                                                                                                                                  Each entity specifies a

                                                                                                                                  single reprojection task (ie

                                                                                                                                  a single tile)

                                                                                                                                  Query this table to get

                                                                                                                                  geo-metadata (eg

                                                                                                                                  boundaries) for each swath

                                                                                                                                  tile

                                                                                                                                  Query this table to get the

                                                                                                                                  list of satellite scan times

                                                                                                                                  that cover a target tile

                                                                                                                                  Swath Source

                                                                                                                                  Data Storage

                                                                                                                                  bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                                                                  bull Storage costs driven by data scale and 6 month project duration

                                                                                                                                  bull Small with respect to the people costs even at graduate student rates

                                                                                                                                  Reduction 1 Queue

                                                                                                                                  Source Metadata

                                                                                                                                  Request Queue

                                                                                                                                  Scientific Results Download

                                                                                                                                  Data Collection Stage

                                                                                                                                  Source Imagery Download Sites

                                                                                                                                  Reprojection Queue

                                                                                                                                  Reduction 2 Queue

                                                                                                                                  Download

                                                                                                                                  Queue

                                                                                                                                  Scientists

                                                                                                                                  Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                  400-500 GB

                                                                                                                                  60K files

                                                                                                                                  10 MBsec

                                                                                                                                  11 hours

                                                                                                                                  lt10 workers

                                                                                                                                  $50 upload

                                                                                                                                  $450 storage

                                                                                                                                  400 GB

                                                                                                                                  45K files

                                                                                                                                  3500 hours

                                                                                                                                  20-100

                                                                                                                                  workers

                                                                                                                                  5-7 GB

                                                                                                                                  55K files

                                                                                                                                  1800 hours

                                                                                                                                  20-100

                                                                                                                                  workers

                                                                                                                                  lt10 GB

                                                                                                                                  ~1K files

                                                                                                                                  1800 hours

                                                                                                                                  20-100

                                                                                                                                  workers

                                                                                                                                  $420 cpu

                                                                                                                                  $60 download

                                                                                                                                  $216 cpu

                                                                                                                                  $1 download

                                                                                                                                  $6 storage

                                                                                                                                  $216 cpu

                                                                                                                                  $2 download

                                                                                                                                  $9 storage

                                                                                                                                  AzureMODIS

                                                                                                                                  Service Web Role Portal

                                                                                                                                  Total $1420

                                                                                                                                  bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                                                                  potential to be important to both large and small scale science problems

                                                                                                                                  bull Equally import they can increase participation in research providing needed

                                                                                                                                  resources to userscommunities without ready access

                                                                                                                                  bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                                                                  support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                                                                  latency applications do not perform optimally on clouds today

                                                                                                                                  bull Provide valuable fault tolerance and scalability abstractions

                                                                                                                                  bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                                                                  bull Clouds services to support research provide considerable leverage for both

                                                                                                                                  individual researchers and entire communities of researchers

                                                                                                                                  • Day 2 - Azure as a PaaS
                                                                                                                                  • Day 2 - Applications

                                                                                                                                    Using Your VM to the Maximum

                                                                                                                                    Remember bull 1 role instance == 1 VM running Windows

                                                                                                                                    bull 1 role instance = one specific task for your code

                                                                                                                                    bull Yoursquore paying for the entire VM so why not use it

                                                                                                                                    bull Common mistake ndash split up code into multiple roles each not using up CPU

                                                                                                                                    bull Balance between using up CPU vs having free capacity in times of need

                                                                                                                                    bull Multiple ways to use your CPU to the fullest

                                                                                                                                    Exploiting Concurrency

                                                                                                                                    bull Spin up additional processes each with a specific task or as a unit of concurrency

                                                                                                                                    bull May not be ideal if number of active processes exceeds number of cores

                                                                                                                                    bull Use multithreading aggressively

                                                                                                                                    bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                                                                                                                                    bull In NET 4 use the Task Parallel Library

                                                                                                                                    bull Data parallelism

                                                                                                                                    bull Task parallelism

                                                                                                                                    Finding Good Code Neighbors

                                                                                                                                    bull Typically code falls into one or more of these categories

                                                                                                                                    bull Find code that is intensive with different resources to live together

                                                                                                                                    bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                                                                                                                                    Memory Intensive

                                                                                                                                    CPU Intensive

                                                                                                                                    Network IO Intensive

                                                                                                                                    Storage IO Intensive

                                                                                                                                    Scaling Appropriately

                                                                                                                                    bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                                                                                                                                    bull Spinning VMs up and down automatically is good at large scale

                                                                                                                                    bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                                                                                                                                    bull Being too aggressive in spinning down VMs can result in poor user experience

                                                                                                                                    bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                                                                                                                                    Performance Cost

                                                                                                                                    Storage Costs

                                                                                                                                    bull Understand an applicationrsquos storage profile and how storage billing works

                                                                                                                                    bull Make service choices based on your app profile

                                                                                                                                    bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                                                                                                                                    bull Service choice can make a big cost difference based on your app profile

                                                                                                                                    bull Caching and compressing They help a lot with storage costs

                                                                                                                                    Saving Bandwidth Costs

                                                                                                                                    Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                                                                                                                    Sending fewer things over the wire often means getting fewer things from storage

                                                                                                                                    Saving bandwidth costs often lead to savings in other places

                                                                                                                                    Sending fewer things means your VM has time to do other tasks

                                                                                                                                    All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                                                                                                                    Compressing Content

                                                                                                                                    1 Gzip all output content

                                                                                                                                    bull All modern browsers can decompress on the fly

                                                                                                                                    bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                                                                                                                    2Tradeoff compute costs for storage size

                                                                                                                                    3Minimize image sizes

                                                                                                                                    bull Use Portable Network Graphics (PNGs)

                                                                                                                                    bull Crush your PNGs

                                                                                                                                    bull Strip needless metadata

                                                                                                                                    bull Make all PNGs palette PNGs

                                                                                                                                    Uncompressed

                                                                                                                                    Content

                                                                                                                                    Compressed

                                                                                                                                    Content

                                                                                                                                    Gzip

                                                                                                                                    Minify JavaScript

                                                                                                                                    Minify CCS

                                                                                                                                    Minify Images

                                                                                                                                    Best Practices Summary

                                                                                                                                    Doing lsquolessrsquo is the key to saving costs

                                                                                                                                    Measure everything

                                                                                                                                    Know your application profile in and out

                                                                                                                                    Research Examples in the Cloud

                                                                                                                                    hellipon another set of slides

                                                                                                                                    Map Reduce on Azure

                                                                                                                                    bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                                                                                                                    bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                                                                                                                    76

                                                                                                                                    Project Daytona - Map Reduce on Azure

                                                                                                                                    httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                                                                                                                    77

                                                                                                                                    Questions and Discussionhellip

                                                                                                                                    Thank you for hosting me at the Summer School

                                                                                                                                    BLAST (Basic Local Alignment Search Tool)

                                                                                                                                    bull The most important software in bioinformatics

                                                                                                                                    bull Identify similarity between bio-sequences

                                                                                                                                    Computationally intensive

                                                                                                                                    bull Large number of pairwise alignment operations

                                                                                                                                    bull A BLAST running can take 700 ~ 1000 CPU hours

                                                                                                                                    bull Sequence databases growing exponentially

                                                                                                                                    bull GenBank doubled in size in about 15 months

                                                                                                                                    It is easy to parallelize BLAST

                                                                                                                                    bull Segment the input

                                                                                                                                    bull Segment processing (querying) is pleasingly parallel

                                                                                                                                    bull Segment the database (eg mpiBLAST)

                                                                                                                                    bull Needs special result reduction processing

                                                                                                                                    Large volume data

                                                                                                                                    bull A normal Blast database can be as large as 10GB

                                                                                                                                    bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                                                                                                    bull The output of BLAST is usually 10-100x larger than the input

                                                                                                                                    bull Parallel BLAST engine on Azure

                                                                                                                                    bull Query-segmentation data-parallel pattern bull split the input sequences

                                                                                                                                    bull query partitions in parallel

                                                                                                                                    bull merge results together when done

                                                                                                                                    bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                                                                                                    bull With three special considerations bull Batch job management

                                                                                                                                    bull Task parallelism on an elastic Cloud

                                                                                                                                    Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                                                                                                    on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                                                                                                    A simple SplitJoin pattern

                                                                                                                                    Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                                                                                                    bull 1248 for small middle large and extra large instance size

                                                                                                                                    Task granularity bull Large partition load imbalance

                                                                                                                                    bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                                                                                                    bull Data transferring overhead

                                                                                                                                    Best Practice test runs to profiling and set size to mitigate the overhead

                                                                                                                                    Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                                                                                                    bull too small repeated computation

                                                                                                                                    bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                                                                                                    bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                                                                                                    bull Watch out for the 2-hour maximum limitation

                                                                                                                                    BLAST task

                                                                                                                                    Splitting task

                                                                                                                                    BLAST task

                                                                                                                                    BLAST task

                                                                                                                                    BLAST task

                                                                                                                                    hellip

                                                                                                                                    Merging Task

                                                                                                                                    Task size vs Performance

                                                                                                                                    bull Benefit of the warm cache effect

                                                                                                                                    bull 100 sequences per partition is the best choice

                                                                                                                                    Instance size vs Performance

                                                                                                                                    bull Super-linear speedup with larger size worker instances

                                                                                                                                    bull Primarily due to the memory capability

                                                                                                                                    Task SizeInstance Size vs Cost

                                                                                                                                    bull Extra-large instance generated the best and the most economical throughput

                                                                                                                                    bull Fully utilize the resource

                                                                                                                                    Web

                                                                                                                                    Portal

                                                                                                                                    Web

                                                                                                                                    Service

                                                                                                                                    Job registration

                                                                                                                                    Job Scheduler

                                                                                                                                    Worker

                                                                                                                                    Worker

                                                                                                                                    Worker

                                                                                                                                    Global

                                                                                                                                    dispatch

                                                                                                                                    queue

                                                                                                                                    Web Role

                                                                                                                                    Azure Table

                                                                                                                                    Job Management Role

                                                                                                                                    Azure Blob

                                                                                                                                    Database

                                                                                                                                    updating Role

                                                                                                                                    hellip

                                                                                                                                    Scaling Engine

                                                                                                                                    Blast databases

                                                                                                                                    temporary data etc)

                                                                                                                                    Job Registry NCBI databases

                                                                                                                                    BLAST task

                                                                                                                                    Splitting task

                                                                                                                                    BLAST task

                                                                                                                                    BLAST task

                                                                                                                                    BLAST task

                                                                                                                                    hellip

                                                                                                                                    Merging Task

                                                                                                                                    ASPNET program hosted by a web role instance bull Submit jobs

                                                                                                                                    bull Track jobrsquos status and logs

                                                                                                                                    AuthenticationAuthorization based on Live ID

                                                                                                                                    The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                                                                                                    Web Portal

                                                                                                                                    Web Service

                                                                                                                                    Job registration

                                                                                                                                    Job Scheduler

                                                                                                                                    Job Portal

                                                                                                                                    Scaling Engine

                                                                                                                                    Job Registry

                                                                                                                                    R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                                                                                                    Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                                                                                                    bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                                                                                                    AzureBLAST significantly saved computing timehellip

                                                                                                                                    Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                                                                                                    ldquoAll against Allrdquo query bull The database is also the input query

                                                                                                                                    bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                                                                                                    bull Theoretically 100 billion sequence comparisons

                                                                                                                                    Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                                                                                                    bull Would require 3216731 minutes (61 years) on one desktop

                                                                                                                                    One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                                                                                                    bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                                                                                    bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                                                                                    bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                                                                                    bull Each segment consists of smaller partitions

                                                                                                                                    bull When load imbalances redistribute the load manually

                                                                                                                                    5

                                                                                                                                    0

                                                                                                                                    62 6

                                                                                                                                    2 6

                                                                                                                                    2 6

                                                                                                                                    2 6

                                                                                                                                    2 5

                                                                                                                                    0 62

                                                                                                                                    bull Total size of the output result is ~230GB

                                                                                                                                    bull The number of total hits is 1764579487

                                                                                                                                    bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                                                                                    bull But based our estimates real working instance time should be 6~8 day

                                                                                                                                    bull Look into log data to analyze what took placehellip

                                                                                                                                    5

                                                                                                                                    0

                                                                                                                                    62 6

                                                                                                                                    2 6

                                                                                                                                    2 6

                                                                                                                                    2 6

                                                                                                                                    2 5

                                                                                                                                    0 62

                                                                                                                                    A normal log record should be

                                                                                                                                    Otherwise something is wrong (eg task failed to complete)

                                                                                                                                    3312010 614 RD00155D3611B0 Executing the task 251523

                                                                                                                                    3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                                                                                    3312010 625 RD00155D3611B0 Executing the task 251553

                                                                                                                                    3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                                                                                    3312010 644 RD00155D3611B0 Executing the task 251600

                                                                                                                                    3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                                                                                    3312010 822 RD00155D3611B0 Executing the task 251774

                                                                                                                                    3312010 950 RD00155D3611B0 Executing the task 251895

                                                                                                                                    3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                                                                                    North Europe Data Center totally 34256 tasks processed

                                                                                                                                    All 62 compute nodes lost tasks

                                                                                                                                    and then came back in a group

                                                                                                                                    This is an Update domain

                                                                                                                                    ~30 mins

                                                                                                                                    ~ 6 nodes in one group

                                                                                                                                    35 Nodes experience blob

                                                                                                                                    writing failure at same

                                                                                                                                    time

                                                                                                                                    West Europe Datacenter 30976 tasks are completed and job was killed

                                                                                                                                    A reasonable guess the

                                                                                                                                    Fault Domain is working

                                                                                                                                    MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                                                                                    You never miss the water till the well has run dry Irish Proverb

                                                                                                                                    ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                                                                                    Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                                                                                    λv = Latent heat of vaporization (Jg)

                                                                                                                                    Rn = Net radiation (W m-2)

                                                                                                                                    cp = Specific heat capacity of air (J kg-1 K-1)

                                                                                                                                    ρa = dry air density (kg m-3)

                                                                                                                                    δq = vapor pressure deficit (Pa)

                                                                                                                                    ga = Conductivity of air (inverse of ra) (m s-1)

                                                                                                                                    gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                                                                                    γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                                                                                    Estimating resistanceconductivity across a

                                                                                                                                    catchment can be tricky

                                                                                                                                    bull Lots of inputs big data reduction

                                                                                                                                    bull Some of the inputs are not so simple

                                                                                                                                    119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                                                                                    (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                                                                                    Penman-Monteith (1964)

                                                                                                                                    Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                                                                                    NASA MODIS

                                                                                                                                    imagery source

                                                                                                                                    archives

                                                                                                                                    5 TB (600K files)

                                                                                                                                    FLUXNET curated

                                                                                                                                    sensor dataset

                                                                                                                                    (30GB 960 files)

                                                                                                                                    FLUXNET curated

                                                                                                                                    field dataset

                                                                                                                                    2 KB (1 file)

                                                                                                                                    NCEPNCAR

                                                                                                                                    ~100MB

                                                                                                                                    (4K files)

                                                                                                                                    Vegetative

                                                                                                                                    clumping

                                                                                                                                    ~5MB (1file)

                                                                                                                                    Climate

                                                                                                                                    classification

                                                                                                                                    ~1MB (1file)

                                                                                                                                    20 US year = 1 global year

                                                                                                                                    Data collection (map) stage

                                                                                                                                    bull Downloads requested input tiles

                                                                                                                                    from NASA ftp sites

                                                                                                                                    bull Includes geospatial lookup for

                                                                                                                                    non-sinusoidal tiles that will

                                                                                                                                    contribute to a reprojected

                                                                                                                                    sinusoidal tile

                                                                                                                                    Reprojection (map) stage

                                                                                                                                    bull Converts source tile(s) to

                                                                                                                                    intermediate result sinusoidal tiles

                                                                                                                                    bull Simple nearest neighbor or spline

                                                                                                                                    algorithms

                                                                                                                                    Derivation reduction stage

                                                                                                                                    bull First stage visible to scientist

                                                                                                                                    bull Computes ET in our initial use

                                                                                                                                    Analysis reduction stage

                                                                                                                                    bull Optional second stage visible to

                                                                                                                                    scientist

                                                                                                                                    bull Enables production of science

                                                                                                                                    analysis artifacts such as maps

                                                                                                                                    tables virtual sensors

                                                                                                                                    Reduction 1

                                                                                                                                    Queue

                                                                                                                                    Source

                                                                                                                                    Metadata

                                                                                                                                    AzureMODIS

                                                                                                                                    Service Web Role Portal

                                                                                                                                    Request

                                                                                                                                    Queue

                                                                                                                                    Scientific

                                                                                                                                    Results

                                                                                                                                    Download

                                                                                                                                    Data Collection Stage

                                                                                                                                    Source Imagery Download Sites

                                                                                                                                    Reprojection

                                                                                                                                    Queue

                                                                                                                                    Reduction 2

                                                                                                                                    Queue

                                                                                                                                    Download

                                                                                                                                    Queue

                                                                                                                                    Scientists

                                                                                                                                    Science results

                                                                                                                                    Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                    httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                                                                    bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                                                                    bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                                                                    bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                                                                    recoverable units of work

                                                                                                                                    bull Execution status of all jobs and tasks persisted in Tables

                                                                                                                                    ltPipelineStagegt

                                                                                                                                    Request

                                                                                                                                    hellip ltPipelineStagegtJobStatus

                                                                                                                                    Persist ltPipelineStagegtJob Queue

                                                                                                                                    MODISAzure Service

                                                                                                                                    (Web Role)

                                                                                                                                    Service Monitor

                                                                                                                                    (Worker Role)

                                                                                                                                    Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                    hellip

                                                                                                                                    Dispatch

                                                                                                                                    ltPipelineStagegtTask Queue

                                                                                                                                    All work actually done by a Worker Role

                                                                                                                                    bull Sandboxes science or other executable

                                                                                                                                    bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                                                                    Service Monitor

                                                                                                                                    (Worker Role)

                                                                                                                                    Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                    GenericWorker

                                                                                                                                    (Worker Role)

                                                                                                                                    hellip

                                                                                                                                    hellip

                                                                                                                                    Dispatch

                                                                                                                                    ltPipelineStagegtTask Queue

                                                                                                                                    hellip

                                                                                                                                    ltInputgtData Storage

                                                                                                                                    bull Dequeues tasks created by the Service Monitor

                                                                                                                                    bull Retries failed tasks 3 times

                                                                                                                                    bull Maintains all task status

                                                                                                                                    Reprojection Request

                                                                                                                                    hellip

                                                                                                                                    Service Monitor

                                                                                                                                    (Worker Role)

                                                                                                                                    ReprojectionJobStatus Persist

                                                                                                                                    Parse amp Persist ReprojectionTaskStatus

                                                                                                                                    GenericWorker

                                                                                                                                    (Worker Role)

                                                                                                                                    hellip

                                                                                                                                    Job Queue

                                                                                                                                    hellip

                                                                                                                                    Dispatch

                                                                                                                                    Task Queue

                                                                                                                                    Points to

                                                                                                                                    hellip

                                                                                                                                    ScanTimeList

                                                                                                                                    SwathGranuleMeta

                                                                                                                                    Reprojection Data

                                                                                                                                    Storage

                                                                                                                                    Each entity specifies a

                                                                                                                                    single reprojection job

                                                                                                                                    request

                                                                                                                                    Each entity specifies a

                                                                                                                                    single reprojection task (ie

                                                                                                                                    a single tile)

                                                                                                                                    Query this table to get

                                                                                                                                    geo-metadata (eg

                                                                                                                                    boundaries) for each swath

                                                                                                                                    tile

                                                                                                                                    Query this table to get the

                                                                                                                                    list of satellite scan times

                                                                                                                                    that cover a target tile

                                                                                                                                    Swath Source

                                                                                                                                    Data Storage

                                                                                                                                    bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                                                                    bull Storage costs driven by data scale and 6 month project duration

                                                                                                                                    bull Small with respect to the people costs even at graduate student rates

                                                                                                                                    Reduction 1 Queue

                                                                                                                                    Source Metadata

                                                                                                                                    Request Queue

                                                                                                                                    Scientific Results Download

                                                                                                                                    Data Collection Stage

                                                                                                                                    Source Imagery Download Sites

                                                                                                                                    Reprojection Queue

                                                                                                                                    Reduction 2 Queue

                                                                                                                                    Download

                                                                                                                                    Queue

                                                                                                                                    Scientists

                                                                                                                                    Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                    400-500 GB

                                                                                                                                    60K files

                                                                                                                                    10 MBsec

                                                                                                                                    11 hours

                                                                                                                                    lt10 workers

                                                                                                                                    $50 upload

                                                                                                                                    $450 storage

                                                                                                                                    400 GB

                                                                                                                                    45K files

                                                                                                                                    3500 hours

                                                                                                                                    20-100

                                                                                                                                    workers

                                                                                                                                    5-7 GB

                                                                                                                                    55K files

                                                                                                                                    1800 hours

                                                                                                                                    20-100

                                                                                                                                    workers

                                                                                                                                    lt10 GB

                                                                                                                                    ~1K files

                                                                                                                                    1800 hours

                                                                                                                                    20-100

                                                                                                                                    workers

                                                                                                                                    $420 cpu

                                                                                                                                    $60 download

                                                                                                                                    $216 cpu

                                                                                                                                    $1 download

                                                                                                                                    $6 storage

                                                                                                                                    $216 cpu

                                                                                                                                    $2 download

                                                                                                                                    $9 storage

                                                                                                                                    AzureMODIS

                                                                                                                                    Service Web Role Portal

                                                                                                                                    Total $1420

                                                                                                                                    bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                                                                    potential to be important to both large and small scale science problems

                                                                                                                                    bull Equally import they can increase participation in research providing needed

                                                                                                                                    resources to userscommunities without ready access

                                                                                                                                    bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                                                                    support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                                                                    latency applications do not perform optimally on clouds today

                                                                                                                                    bull Provide valuable fault tolerance and scalability abstractions

                                                                                                                                    bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                                                                    bull Clouds services to support research provide considerable leverage for both

                                                                                                                                    individual researchers and entire communities of researchers

                                                                                                                                    • Day 2 - Azure as a PaaS
                                                                                                                                    • Day 2 - Applications

                                                                                                                                      Exploiting Concurrency

                                                                                                                                      bull Spin up additional processes each with a specific task or as a unit of concurrency

                                                                                                                                      bull May not be ideal if number of active processes exceeds number of cores

                                                                                                                                      bull Use multithreading aggressively

                                                                                                                                      bull In networking code correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads

                                                                                                                                      bull In NET 4 use the Task Parallel Library

                                                                                                                                      bull Data parallelism

                                                                                                                                      bull Task parallelism

                                                                                                                                      Finding Good Code Neighbors

                                                                                                                                      bull Typically code falls into one or more of these categories

                                                                                                                                      bull Find code that is intensive with different resources to live together

                                                                                                                                      bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                                                                                                                                      Memory Intensive

                                                                                                                                      CPU Intensive

                                                                                                                                      Network IO Intensive

                                                                                                                                      Storage IO Intensive

                                                                                                                                      Scaling Appropriately

                                                                                                                                      bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                                                                                                                                      bull Spinning VMs up and down automatically is good at large scale

                                                                                                                                      bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                                                                                                                                      bull Being too aggressive in spinning down VMs can result in poor user experience

                                                                                                                                      bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                                                                                                                                      Performance Cost

                                                                                                                                      Storage Costs

                                                                                                                                      bull Understand an applicationrsquos storage profile and how storage billing works

                                                                                                                                      bull Make service choices based on your app profile

                                                                                                                                      bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                                                                                                                                      bull Service choice can make a big cost difference based on your app profile

                                                                                                                                      bull Caching and compressing They help a lot with storage costs

                                                                                                                                      Saving Bandwidth Costs

                                                                                                                                      Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                                                                                                                      Sending fewer things over the wire often means getting fewer things from storage

                                                                                                                                      Saving bandwidth costs often lead to savings in other places

                                                                                                                                      Sending fewer things means your VM has time to do other tasks

                                                                                                                                      All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                                                                                                                      Compressing Content

                                                                                                                                      1 Gzip all output content

                                                                                                                                      bull All modern browsers can decompress on the fly

                                                                                                                                      bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                                                                                                                      2Tradeoff compute costs for storage size

                                                                                                                                      3Minimize image sizes

                                                                                                                                      bull Use Portable Network Graphics (PNGs)

                                                                                                                                      bull Crush your PNGs

                                                                                                                                      bull Strip needless metadata

                                                                                                                                      bull Make all PNGs palette PNGs

                                                                                                                                      Uncompressed

                                                                                                                                      Content

                                                                                                                                      Compressed

                                                                                                                                      Content

                                                                                                                                      Gzip

                                                                                                                                      Minify JavaScript

                                                                                                                                      Minify CCS

                                                                                                                                      Minify Images

                                                                                                                                      Best Practices Summary

                                                                                                                                      Doing lsquolessrsquo is the key to saving costs

                                                                                                                                      Measure everything

                                                                                                                                      Know your application profile in and out

                                                                                                                                      Research Examples in the Cloud

                                                                                                                                      hellipon another set of slides

                                                                                                                                      Map Reduce on Azure

                                                                                                                                      bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                                                                                                                      bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                                                                                                                      76

                                                                                                                                      Project Daytona - Map Reduce on Azure

                                                                                                                                      httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                                                                                                                      77

                                                                                                                                      Questions and Discussionhellip

                                                                                                                                      Thank you for hosting me at the Summer School

                                                                                                                                      BLAST (Basic Local Alignment Search Tool)

                                                                                                                                      bull The most important software in bioinformatics

                                                                                                                                      bull Identify similarity between bio-sequences

                                                                                                                                      Computationally intensive

                                                                                                                                      bull Large number of pairwise alignment operations

                                                                                                                                      bull A BLAST running can take 700 ~ 1000 CPU hours

                                                                                                                                      bull Sequence databases growing exponentially

                                                                                                                                      bull GenBank doubled in size in about 15 months

                                                                                                                                      It is easy to parallelize BLAST

                                                                                                                                      bull Segment the input

                                                                                                                                      bull Segment processing (querying) is pleasingly parallel

                                                                                                                                      bull Segment the database (eg mpiBLAST)

                                                                                                                                      bull Needs special result reduction processing

                                                                                                                                      Large volume data

                                                                                                                                      bull A normal Blast database can be as large as 10GB

                                                                                                                                      bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                                                                                                      bull The output of BLAST is usually 10-100x larger than the input

                                                                                                                                      bull Parallel BLAST engine on Azure

                                                                                                                                      bull Query-segmentation data-parallel pattern bull split the input sequences

                                                                                                                                      bull query partitions in parallel

                                                                                                                                      bull merge results together when done

                                                                                                                                      bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                                                                                                      bull With three special considerations bull Batch job management

                                                                                                                                      bull Task parallelism on an elastic Cloud

                                                                                                                                      Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                                                                                                      on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                                                                                                      A simple SplitJoin pattern

                                                                                                                                      Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                                                                                                      bull 1248 for small middle large and extra large instance size

                                                                                                                                      Task granularity bull Large partition load imbalance

                                                                                                                                      bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                                                                                                      bull Data transferring overhead

                                                                                                                                      Best Practice test runs to profiling and set size to mitigate the overhead

                                                                                                                                      Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                                                                                                      bull too small repeated computation

                                                                                                                                      bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                                                                                                      bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                                                                                                      bull Watch out for the 2-hour maximum limitation

                                                                                                                                      BLAST task

                                                                                                                                      Splitting task

                                                                                                                                      BLAST task

                                                                                                                                      BLAST task

                                                                                                                                      BLAST task

                                                                                                                                      hellip

                                                                                                                                      Merging Task

                                                                                                                                      Task size vs Performance

                                                                                                                                      bull Benefit of the warm cache effect

                                                                                                                                      bull 100 sequences per partition is the best choice

                                                                                                                                      Instance size vs Performance

                                                                                                                                      bull Super-linear speedup with larger size worker instances

                                                                                                                                      bull Primarily due to the memory capability

                                                                                                                                      Task SizeInstance Size vs Cost

                                                                                                                                      bull Extra-large instance generated the best and the most economical throughput

                                                                                                                                      bull Fully utilize the resource

                                                                                                                                      Web

                                                                                                                                      Portal

                                                                                                                                      Web

                                                                                                                                      Service

                                                                                                                                      Job registration

                                                                                                                                      Job Scheduler

                                                                                                                                      Worker

                                                                                                                                      Worker

                                                                                                                                      Worker

                                                                                                                                      Global

                                                                                                                                      dispatch

                                                                                                                                      queue

                                                                                                                                      Web Role

                                                                                                                                      Azure Table

                                                                                                                                      Job Management Role

                                                                                                                                      Azure Blob

                                                                                                                                      Database

                                                                                                                                      updating Role

                                                                                                                                      hellip

                                                                                                                                      Scaling Engine

                                                                                                                                      Blast databases

                                                                                                                                      temporary data etc)

                                                                                                                                      Job Registry NCBI databases

                                                                                                                                      BLAST task

                                                                                                                                      Splitting task

                                                                                                                                      BLAST task

                                                                                                                                      BLAST task

                                                                                                                                      BLAST task

                                                                                                                                      hellip

                                                                                                                                      Merging Task

                                                                                                                                      ASPNET program hosted by a web role instance bull Submit jobs

                                                                                                                                      bull Track jobrsquos status and logs

                                                                                                                                      AuthenticationAuthorization based on Live ID

                                                                                                                                      The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                                                                                                      Web Portal

                                                                                                                                      Web Service

                                                                                                                                      Job registration

                                                                                                                                      Job Scheduler

                                                                                                                                      Job Portal

                                                                                                                                      Scaling Engine

                                                                                                                                      Job Registry

                                                                                                                                      R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                                                                                                      Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                                                                                                      bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                                                                                                      AzureBLAST significantly saved computing timehellip

                                                                                                                                      Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                                                                                                      ldquoAll against Allrdquo query bull The database is also the input query

                                                                                                                                      bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                                                                                                      bull Theoretically 100 billion sequence comparisons

                                                                                                                                      Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                                                                                                      bull Would require 3216731 minutes (61 years) on one desktop

                                                                                                                                      One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                                                                                                      bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                                                                                      bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                                                                                      bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                                                                                      bull Each segment consists of smaller partitions

                                                                                                                                      bull When load imbalances redistribute the load manually

                                                                                                                                      5

                                                                                                                                      0

                                                                                                                                      62 6

                                                                                                                                      2 6

                                                                                                                                      2 6

                                                                                                                                      2 6

                                                                                                                                      2 5

                                                                                                                                      0 62

                                                                                                                                      bull Total size of the output result is ~230GB

                                                                                                                                      bull The number of total hits is 1764579487

                                                                                                                                      bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                                                                                      bull But based our estimates real working instance time should be 6~8 day

                                                                                                                                      bull Look into log data to analyze what took placehellip

                                                                                                                                      5

                                                                                                                                      0

                                                                                                                                      62 6

                                                                                                                                      2 6

                                                                                                                                      2 6

                                                                                                                                      2 6

                                                                                                                                      2 5

                                                                                                                                      0 62

                                                                                                                                      A normal log record should be

                                                                                                                                      Otherwise something is wrong (eg task failed to complete)

                                                                                                                                      3312010 614 RD00155D3611B0 Executing the task 251523

                                                                                                                                      3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                                                                                      3312010 625 RD00155D3611B0 Executing the task 251553

                                                                                                                                      3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                                                                                      3312010 644 RD00155D3611B0 Executing the task 251600

                                                                                                                                      3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                                                                                      3312010 822 RD00155D3611B0 Executing the task 251774

                                                                                                                                      3312010 950 RD00155D3611B0 Executing the task 251895

                                                                                                                                      3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                                                                                      North Europe Data Center totally 34256 tasks processed

                                                                                                                                      All 62 compute nodes lost tasks

                                                                                                                                      and then came back in a group

                                                                                                                                      This is an Update domain

                                                                                                                                      ~30 mins

                                                                                                                                      ~ 6 nodes in one group

                                                                                                                                      35 Nodes experience blob

                                                                                                                                      writing failure at same

                                                                                                                                      time

                                                                                                                                      West Europe Datacenter 30976 tasks are completed and job was killed

                                                                                                                                      A reasonable guess the

                                                                                                                                      Fault Domain is working

                                                                                                                                      MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                                                                                      You never miss the water till the well has run dry Irish Proverb

                                                                                                                                      ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                                                                                      Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                                                                                      λv = Latent heat of vaporization (Jg)

                                                                                                                                      Rn = Net radiation (W m-2)

                                                                                                                                      cp = Specific heat capacity of air (J kg-1 K-1)

                                                                                                                                      ρa = dry air density (kg m-3)

                                                                                                                                      δq = vapor pressure deficit (Pa)

                                                                                                                                      ga = Conductivity of air (inverse of ra) (m s-1)

                                                                                                                                      gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                                                                                      γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                                                                                      Estimating resistanceconductivity across a

                                                                                                                                      catchment can be tricky

                                                                                                                                      bull Lots of inputs big data reduction

                                                                                                                                      bull Some of the inputs are not so simple

                                                                                                                                      119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                                                                                      (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                                                                                      Penman-Monteith (1964)

                                                                                                                                      Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                                                                                      NASA MODIS

                                                                                                                                      imagery source

                                                                                                                                      archives

                                                                                                                                      5 TB (600K files)

                                                                                                                                      FLUXNET curated

                                                                                                                                      sensor dataset

                                                                                                                                      (30GB 960 files)

                                                                                                                                      FLUXNET curated

                                                                                                                                      field dataset

                                                                                                                                      2 KB (1 file)

                                                                                                                                      NCEPNCAR

                                                                                                                                      ~100MB

                                                                                                                                      (4K files)

                                                                                                                                      Vegetative

                                                                                                                                      clumping

                                                                                                                                      ~5MB (1file)

                                                                                                                                      Climate

                                                                                                                                      classification

                                                                                                                                      ~1MB (1file)

                                                                                                                                      20 US year = 1 global year

                                                                                                                                      Data collection (map) stage

                                                                                                                                      bull Downloads requested input tiles

                                                                                                                                      from NASA ftp sites

                                                                                                                                      bull Includes geospatial lookup for

                                                                                                                                      non-sinusoidal tiles that will

                                                                                                                                      contribute to a reprojected

                                                                                                                                      sinusoidal tile

                                                                                                                                      Reprojection (map) stage

                                                                                                                                      bull Converts source tile(s) to

                                                                                                                                      intermediate result sinusoidal tiles

                                                                                                                                      bull Simple nearest neighbor or spline

                                                                                                                                      algorithms

                                                                                                                                      Derivation reduction stage

                                                                                                                                      bull First stage visible to scientist

                                                                                                                                      bull Computes ET in our initial use

                                                                                                                                      Analysis reduction stage

                                                                                                                                      bull Optional second stage visible to

                                                                                                                                      scientist

                                                                                                                                      bull Enables production of science

                                                                                                                                      analysis artifacts such as maps

                                                                                                                                      tables virtual sensors

                                                                                                                                      Reduction 1

                                                                                                                                      Queue

                                                                                                                                      Source

                                                                                                                                      Metadata

                                                                                                                                      AzureMODIS

                                                                                                                                      Service Web Role Portal

                                                                                                                                      Request

                                                                                                                                      Queue

                                                                                                                                      Scientific

                                                                                                                                      Results

                                                                                                                                      Download

                                                                                                                                      Data Collection Stage

                                                                                                                                      Source Imagery Download Sites

                                                                                                                                      Reprojection

                                                                                                                                      Queue

                                                                                                                                      Reduction 2

                                                                                                                                      Queue

                                                                                                                                      Download

                                                                                                                                      Queue

                                                                                                                                      Scientists

                                                                                                                                      Science results

                                                                                                                                      Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                      httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                                                                      bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                                                                      bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                                                                      bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                                                                      recoverable units of work

                                                                                                                                      bull Execution status of all jobs and tasks persisted in Tables

                                                                                                                                      ltPipelineStagegt

                                                                                                                                      Request

                                                                                                                                      hellip ltPipelineStagegtJobStatus

                                                                                                                                      Persist ltPipelineStagegtJob Queue

                                                                                                                                      MODISAzure Service

                                                                                                                                      (Web Role)

                                                                                                                                      Service Monitor

                                                                                                                                      (Worker Role)

                                                                                                                                      Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                      hellip

                                                                                                                                      Dispatch

                                                                                                                                      ltPipelineStagegtTask Queue

                                                                                                                                      All work actually done by a Worker Role

                                                                                                                                      bull Sandboxes science or other executable

                                                                                                                                      bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                                                                      Service Monitor

                                                                                                                                      (Worker Role)

                                                                                                                                      Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                      GenericWorker

                                                                                                                                      (Worker Role)

                                                                                                                                      hellip

                                                                                                                                      hellip

                                                                                                                                      Dispatch

                                                                                                                                      ltPipelineStagegtTask Queue

                                                                                                                                      hellip

                                                                                                                                      ltInputgtData Storage

                                                                                                                                      bull Dequeues tasks created by the Service Monitor

                                                                                                                                      bull Retries failed tasks 3 times

                                                                                                                                      bull Maintains all task status

                                                                                                                                      Reprojection Request

                                                                                                                                      hellip

                                                                                                                                      Service Monitor

                                                                                                                                      (Worker Role)

                                                                                                                                      ReprojectionJobStatus Persist

                                                                                                                                      Parse amp Persist ReprojectionTaskStatus

                                                                                                                                      GenericWorker

                                                                                                                                      (Worker Role)

                                                                                                                                      hellip

                                                                                                                                      Job Queue

                                                                                                                                      hellip

                                                                                                                                      Dispatch

                                                                                                                                      Task Queue

                                                                                                                                      Points to

                                                                                                                                      hellip

                                                                                                                                      ScanTimeList

                                                                                                                                      SwathGranuleMeta

                                                                                                                                      Reprojection Data

                                                                                                                                      Storage

                                                                                                                                      Each entity specifies a

                                                                                                                                      single reprojection job

                                                                                                                                      request

                                                                                                                                      Each entity specifies a

                                                                                                                                      single reprojection task (ie

                                                                                                                                      a single tile)

                                                                                                                                      Query this table to get

                                                                                                                                      geo-metadata (eg

                                                                                                                                      boundaries) for each swath

                                                                                                                                      tile

                                                                                                                                      Query this table to get the

                                                                                                                                      list of satellite scan times

                                                                                                                                      that cover a target tile

                                                                                                                                      Swath Source

                                                                                                                                      Data Storage

                                                                                                                                      bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                                                                      bull Storage costs driven by data scale and 6 month project duration

                                                                                                                                      bull Small with respect to the people costs even at graduate student rates

                                                                                                                                      Reduction 1 Queue

                                                                                                                                      Source Metadata

                                                                                                                                      Request Queue

                                                                                                                                      Scientific Results Download

                                                                                                                                      Data Collection Stage

                                                                                                                                      Source Imagery Download Sites

                                                                                                                                      Reprojection Queue

                                                                                                                                      Reduction 2 Queue

                                                                                                                                      Download

                                                                                                                                      Queue

                                                                                                                                      Scientists

                                                                                                                                      Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                      400-500 GB

                                                                                                                                      60K files

                                                                                                                                      10 MBsec

                                                                                                                                      11 hours

                                                                                                                                      lt10 workers

                                                                                                                                      $50 upload

                                                                                                                                      $450 storage

                                                                                                                                      400 GB

                                                                                                                                      45K files

                                                                                                                                      3500 hours

                                                                                                                                      20-100

                                                                                                                                      workers

                                                                                                                                      5-7 GB

                                                                                                                                      55K files

                                                                                                                                      1800 hours

                                                                                                                                      20-100

                                                                                                                                      workers

                                                                                                                                      lt10 GB

                                                                                                                                      ~1K files

                                                                                                                                      1800 hours

                                                                                                                                      20-100

                                                                                                                                      workers

                                                                                                                                      $420 cpu

                                                                                                                                      $60 download

                                                                                                                                      $216 cpu

                                                                                                                                      $1 download

                                                                                                                                      $6 storage

                                                                                                                                      $216 cpu

                                                                                                                                      $2 download

                                                                                                                                      $9 storage

                                                                                                                                      AzureMODIS

                                                                                                                                      Service Web Role Portal

                                                                                                                                      Total $1420

                                                                                                                                      bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                                                                      potential to be important to both large and small scale science problems

                                                                                                                                      bull Equally import they can increase participation in research providing needed

                                                                                                                                      resources to userscommunities without ready access

                                                                                                                                      bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                                                                      support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                                                                      latency applications do not perform optimally on clouds today

                                                                                                                                      bull Provide valuable fault tolerance and scalability abstractions

                                                                                                                                      bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                                                                      bull Clouds services to support research provide considerable leverage for both

                                                                                                                                      individual researchers and entire communities of researchers

                                                                                                                                      • Day 2 - Azure as a PaaS
                                                                                                                                      • Day 2 - Applications

                                                                                                                                        Finding Good Code Neighbors

                                                                                                                                        bull Typically code falls into one or more of these categories

                                                                                                                                        bull Find code that is intensive with different resources to live together

                                                                                                                                        bull Example distributed network caches are typically network- and memory-intensive they may be a good neighbor for storage IO-intensive code

                                                                                                                                        Memory Intensive

                                                                                                                                        CPU Intensive

                                                                                                                                        Network IO Intensive

                                                                                                                                        Storage IO Intensive

                                                                                                                                        Scaling Appropriately

                                                                                                                                        bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                                                                                                                                        bull Spinning VMs up and down automatically is good at large scale

                                                                                                                                        bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                                                                                                                                        bull Being too aggressive in spinning down VMs can result in poor user experience

                                                                                                                                        bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                                                                                                                                        Performance Cost

                                                                                                                                        Storage Costs

                                                                                                                                        bull Understand an applicationrsquos storage profile and how storage billing works

                                                                                                                                        bull Make service choices based on your app profile

                                                                                                                                        bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                                                                                                                                        bull Service choice can make a big cost difference based on your app profile

                                                                                                                                        bull Caching and compressing They help a lot with storage costs

                                                                                                                                        Saving Bandwidth Costs

                                                                                                                                        Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                                                                                                                        Sending fewer things over the wire often means getting fewer things from storage

                                                                                                                                        Saving bandwidth costs often lead to savings in other places

                                                                                                                                        Sending fewer things means your VM has time to do other tasks

                                                                                                                                        All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                                                                                                                        Compressing Content

                                                                                                                                        1 Gzip all output content

                                                                                                                                        bull All modern browsers can decompress on the fly

                                                                                                                                        bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                                                                                                                        2Tradeoff compute costs for storage size

                                                                                                                                        3Minimize image sizes

                                                                                                                                        bull Use Portable Network Graphics (PNGs)

                                                                                                                                        bull Crush your PNGs

                                                                                                                                        bull Strip needless metadata

                                                                                                                                        bull Make all PNGs palette PNGs

                                                                                                                                        Uncompressed

                                                                                                                                        Content

                                                                                                                                        Compressed

                                                                                                                                        Content

                                                                                                                                        Gzip

                                                                                                                                        Minify JavaScript

                                                                                                                                        Minify CCS

                                                                                                                                        Minify Images

                                                                                                                                        Best Practices Summary

                                                                                                                                        Doing lsquolessrsquo is the key to saving costs

                                                                                                                                        Measure everything

                                                                                                                                        Know your application profile in and out

                                                                                                                                        Research Examples in the Cloud

                                                                                                                                        hellipon another set of slides

                                                                                                                                        Map Reduce on Azure

                                                                                                                                        bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                                                                                                                        bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                                                                                                                        76

                                                                                                                                        Project Daytona - Map Reduce on Azure

                                                                                                                                        httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                                                                                                                        77

                                                                                                                                        Questions and Discussionhellip

                                                                                                                                        Thank you for hosting me at the Summer School

                                                                                                                                        BLAST (Basic Local Alignment Search Tool)

                                                                                                                                        bull The most important software in bioinformatics

                                                                                                                                        bull Identify similarity between bio-sequences

                                                                                                                                        Computationally intensive

                                                                                                                                        bull Large number of pairwise alignment operations

                                                                                                                                        bull A BLAST running can take 700 ~ 1000 CPU hours

                                                                                                                                        bull Sequence databases growing exponentially

                                                                                                                                        bull GenBank doubled in size in about 15 months

                                                                                                                                        It is easy to parallelize BLAST

                                                                                                                                        bull Segment the input

                                                                                                                                        bull Segment processing (querying) is pleasingly parallel

                                                                                                                                        bull Segment the database (eg mpiBLAST)

                                                                                                                                        bull Needs special result reduction processing

                                                                                                                                        Large volume data

                                                                                                                                        bull A normal Blast database can be as large as 10GB

                                                                                                                                        bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                                                                                                        bull The output of BLAST is usually 10-100x larger than the input

                                                                                                                                        bull Parallel BLAST engine on Azure

                                                                                                                                        bull Query-segmentation data-parallel pattern bull split the input sequences

                                                                                                                                        bull query partitions in parallel

                                                                                                                                        bull merge results together when done

                                                                                                                                        bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                                                                                                        bull With three special considerations bull Batch job management

                                                                                                                                        bull Task parallelism on an elastic Cloud

                                                                                                                                        Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                                                                                                        on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                                                                                                        A simple SplitJoin pattern

                                                                                                                                        Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                                                                                                        bull 1248 for small middle large and extra large instance size

                                                                                                                                        Task granularity bull Large partition load imbalance

                                                                                                                                        bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                                                                                                        bull Data transferring overhead

                                                                                                                                        Best Practice test runs to profiling and set size to mitigate the overhead

                                                                                                                                        Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                                                                                                        bull too small repeated computation

                                                                                                                                        bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                                                                                                        bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                                                                                                        bull Watch out for the 2-hour maximum limitation

                                                                                                                                        BLAST task

                                                                                                                                        Splitting task

                                                                                                                                        BLAST task

                                                                                                                                        BLAST task

                                                                                                                                        BLAST task

                                                                                                                                        hellip

                                                                                                                                        Merging Task

                                                                                                                                        Task size vs Performance

                                                                                                                                        bull Benefit of the warm cache effect

                                                                                                                                        bull 100 sequences per partition is the best choice

                                                                                                                                        Instance size vs Performance

                                                                                                                                        bull Super-linear speedup with larger size worker instances

                                                                                                                                        bull Primarily due to the memory capability

                                                                                                                                        Task SizeInstance Size vs Cost

                                                                                                                                        bull Extra-large instance generated the best and the most economical throughput

                                                                                                                                        bull Fully utilize the resource

                                                                                                                                        Web

                                                                                                                                        Portal

                                                                                                                                        Web

                                                                                                                                        Service

                                                                                                                                        Job registration

                                                                                                                                        Job Scheduler

                                                                                                                                        Worker

                                                                                                                                        Worker

                                                                                                                                        Worker

                                                                                                                                        Global

                                                                                                                                        dispatch

                                                                                                                                        queue

                                                                                                                                        Web Role

                                                                                                                                        Azure Table

                                                                                                                                        Job Management Role

                                                                                                                                        Azure Blob

                                                                                                                                        Database

                                                                                                                                        updating Role

                                                                                                                                        hellip

                                                                                                                                        Scaling Engine

                                                                                                                                        Blast databases

                                                                                                                                        temporary data etc)

                                                                                                                                        Job Registry NCBI databases

                                                                                                                                        BLAST task

                                                                                                                                        Splitting task

                                                                                                                                        BLAST task

                                                                                                                                        BLAST task

                                                                                                                                        BLAST task

                                                                                                                                        hellip

                                                                                                                                        Merging Task

                                                                                                                                        ASPNET program hosted by a web role instance bull Submit jobs

                                                                                                                                        bull Track jobrsquos status and logs

                                                                                                                                        AuthenticationAuthorization based on Live ID

                                                                                                                                        The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                                                                                                        Web Portal

                                                                                                                                        Web Service

                                                                                                                                        Job registration

                                                                                                                                        Job Scheduler

                                                                                                                                        Job Portal

                                                                                                                                        Scaling Engine

                                                                                                                                        Job Registry

                                                                                                                                        R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                                                                                                        Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                                                                                                        bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                                                                                                        AzureBLAST significantly saved computing timehellip

                                                                                                                                        Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                                                                                                        ldquoAll against Allrdquo query bull The database is also the input query

                                                                                                                                        bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                                                                                                        bull Theoretically 100 billion sequence comparisons

                                                                                                                                        Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                                                                                                        bull Would require 3216731 minutes (61 years) on one desktop

                                                                                                                                        One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                                                                                                        bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                                                                                        bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                                                                                        bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                                                                                        bull Each segment consists of smaller partitions

                                                                                                                                        bull When load imbalances redistribute the load manually

                                                                                                                                        5

                                                                                                                                        0

                                                                                                                                        62 6

                                                                                                                                        2 6

                                                                                                                                        2 6

                                                                                                                                        2 6

                                                                                                                                        2 5

                                                                                                                                        0 62

                                                                                                                                        bull Total size of the output result is ~230GB

                                                                                                                                        bull The number of total hits is 1764579487

                                                                                                                                        bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                                                                                        bull But based our estimates real working instance time should be 6~8 day

                                                                                                                                        bull Look into log data to analyze what took placehellip

                                                                                                                                        5

                                                                                                                                        0

                                                                                                                                        62 6

                                                                                                                                        2 6

                                                                                                                                        2 6

                                                                                                                                        2 6

                                                                                                                                        2 5

                                                                                                                                        0 62

                                                                                                                                        A normal log record should be

                                                                                                                                        Otherwise something is wrong (eg task failed to complete)

                                                                                                                                        3312010 614 RD00155D3611B0 Executing the task 251523

                                                                                                                                        3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                                                                                        3312010 625 RD00155D3611B0 Executing the task 251553

                                                                                                                                        3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                                                                                        3312010 644 RD00155D3611B0 Executing the task 251600

                                                                                                                                        3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                                                                                        3312010 822 RD00155D3611B0 Executing the task 251774

                                                                                                                                        3312010 950 RD00155D3611B0 Executing the task 251895

                                                                                                                                        3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                                                                                        North Europe Data Center totally 34256 tasks processed

                                                                                                                                        All 62 compute nodes lost tasks

                                                                                                                                        and then came back in a group

                                                                                                                                        This is an Update domain

                                                                                                                                        ~30 mins

                                                                                                                                        ~ 6 nodes in one group

                                                                                                                                        35 Nodes experience blob

                                                                                                                                        writing failure at same

                                                                                                                                        time

                                                                                                                                        West Europe Datacenter 30976 tasks are completed and job was killed

                                                                                                                                        A reasonable guess the

                                                                                                                                        Fault Domain is working

                                                                                                                                        MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                                                                                        You never miss the water till the well has run dry Irish Proverb

                                                                                                                                        ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                                                                                        Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                                                                                        λv = Latent heat of vaporization (Jg)

                                                                                                                                        Rn = Net radiation (W m-2)

                                                                                                                                        cp = Specific heat capacity of air (J kg-1 K-1)

                                                                                                                                        ρa = dry air density (kg m-3)

                                                                                                                                        δq = vapor pressure deficit (Pa)

                                                                                                                                        ga = Conductivity of air (inverse of ra) (m s-1)

                                                                                                                                        gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                                                                                        γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                                                                                        Estimating resistanceconductivity across a

                                                                                                                                        catchment can be tricky

                                                                                                                                        bull Lots of inputs big data reduction

                                                                                                                                        bull Some of the inputs are not so simple

                                                                                                                                        119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                                                                                        (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                                                                                        Penman-Monteith (1964)

                                                                                                                                        Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                                                                                        NASA MODIS

                                                                                                                                        imagery source

                                                                                                                                        archives

                                                                                                                                        5 TB (600K files)

                                                                                                                                        FLUXNET curated

                                                                                                                                        sensor dataset

                                                                                                                                        (30GB 960 files)

                                                                                                                                        FLUXNET curated

                                                                                                                                        field dataset

                                                                                                                                        2 KB (1 file)

                                                                                                                                        NCEPNCAR

                                                                                                                                        ~100MB

                                                                                                                                        (4K files)

                                                                                                                                        Vegetative

                                                                                                                                        clumping

                                                                                                                                        ~5MB (1file)

                                                                                                                                        Climate

                                                                                                                                        classification

                                                                                                                                        ~1MB (1file)

                                                                                                                                        20 US year = 1 global year

                                                                                                                                        Data collection (map) stage

                                                                                                                                        bull Downloads requested input tiles

                                                                                                                                        from NASA ftp sites

                                                                                                                                        bull Includes geospatial lookup for

                                                                                                                                        non-sinusoidal tiles that will

                                                                                                                                        contribute to a reprojected

                                                                                                                                        sinusoidal tile

                                                                                                                                        Reprojection (map) stage

                                                                                                                                        bull Converts source tile(s) to

                                                                                                                                        intermediate result sinusoidal tiles

                                                                                                                                        bull Simple nearest neighbor or spline

                                                                                                                                        algorithms

                                                                                                                                        Derivation reduction stage

                                                                                                                                        bull First stage visible to scientist

                                                                                                                                        bull Computes ET in our initial use

                                                                                                                                        Analysis reduction stage

                                                                                                                                        bull Optional second stage visible to

                                                                                                                                        scientist

                                                                                                                                        bull Enables production of science

                                                                                                                                        analysis artifacts such as maps

                                                                                                                                        tables virtual sensors

                                                                                                                                        Reduction 1

                                                                                                                                        Queue

                                                                                                                                        Source

                                                                                                                                        Metadata

                                                                                                                                        AzureMODIS

                                                                                                                                        Service Web Role Portal

                                                                                                                                        Request

                                                                                                                                        Queue

                                                                                                                                        Scientific

                                                                                                                                        Results

                                                                                                                                        Download

                                                                                                                                        Data Collection Stage

                                                                                                                                        Source Imagery Download Sites

                                                                                                                                        Reprojection

                                                                                                                                        Queue

                                                                                                                                        Reduction 2

                                                                                                                                        Queue

                                                                                                                                        Download

                                                                                                                                        Queue

                                                                                                                                        Scientists

                                                                                                                                        Science results

                                                                                                                                        Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                        httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                                                                        bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                                                                        bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                                                                        bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                                                                        recoverable units of work

                                                                                                                                        bull Execution status of all jobs and tasks persisted in Tables

                                                                                                                                        ltPipelineStagegt

                                                                                                                                        Request

                                                                                                                                        hellip ltPipelineStagegtJobStatus

                                                                                                                                        Persist ltPipelineStagegtJob Queue

                                                                                                                                        MODISAzure Service

                                                                                                                                        (Web Role)

                                                                                                                                        Service Monitor

                                                                                                                                        (Worker Role)

                                                                                                                                        Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                        hellip

                                                                                                                                        Dispatch

                                                                                                                                        ltPipelineStagegtTask Queue

                                                                                                                                        All work actually done by a Worker Role

                                                                                                                                        bull Sandboxes science or other executable

                                                                                                                                        bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                                                                        Service Monitor

                                                                                                                                        (Worker Role)

                                                                                                                                        Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                        GenericWorker

                                                                                                                                        (Worker Role)

                                                                                                                                        hellip

                                                                                                                                        hellip

                                                                                                                                        Dispatch

                                                                                                                                        ltPipelineStagegtTask Queue

                                                                                                                                        hellip

                                                                                                                                        ltInputgtData Storage

                                                                                                                                        bull Dequeues tasks created by the Service Monitor

                                                                                                                                        bull Retries failed tasks 3 times

                                                                                                                                        bull Maintains all task status

                                                                                                                                        Reprojection Request

                                                                                                                                        hellip

                                                                                                                                        Service Monitor

                                                                                                                                        (Worker Role)

                                                                                                                                        ReprojectionJobStatus Persist

                                                                                                                                        Parse amp Persist ReprojectionTaskStatus

                                                                                                                                        GenericWorker

                                                                                                                                        (Worker Role)

                                                                                                                                        hellip

                                                                                                                                        Job Queue

                                                                                                                                        hellip

                                                                                                                                        Dispatch

                                                                                                                                        Task Queue

                                                                                                                                        Points to

                                                                                                                                        hellip

                                                                                                                                        ScanTimeList

                                                                                                                                        SwathGranuleMeta

                                                                                                                                        Reprojection Data

                                                                                                                                        Storage

                                                                                                                                        Each entity specifies a

                                                                                                                                        single reprojection job

                                                                                                                                        request

                                                                                                                                        Each entity specifies a

                                                                                                                                        single reprojection task (ie

                                                                                                                                        a single tile)

                                                                                                                                        Query this table to get

                                                                                                                                        geo-metadata (eg

                                                                                                                                        boundaries) for each swath

                                                                                                                                        tile

                                                                                                                                        Query this table to get the

                                                                                                                                        list of satellite scan times

                                                                                                                                        that cover a target tile

                                                                                                                                        Swath Source

                                                                                                                                        Data Storage

                                                                                                                                        bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                                                                        bull Storage costs driven by data scale and 6 month project duration

                                                                                                                                        bull Small with respect to the people costs even at graduate student rates

                                                                                                                                        Reduction 1 Queue

                                                                                                                                        Source Metadata

                                                                                                                                        Request Queue

                                                                                                                                        Scientific Results Download

                                                                                                                                        Data Collection Stage

                                                                                                                                        Source Imagery Download Sites

                                                                                                                                        Reprojection Queue

                                                                                                                                        Reduction 2 Queue

                                                                                                                                        Download

                                                                                                                                        Queue

                                                                                                                                        Scientists

                                                                                                                                        Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                        400-500 GB

                                                                                                                                        60K files

                                                                                                                                        10 MBsec

                                                                                                                                        11 hours

                                                                                                                                        lt10 workers

                                                                                                                                        $50 upload

                                                                                                                                        $450 storage

                                                                                                                                        400 GB

                                                                                                                                        45K files

                                                                                                                                        3500 hours

                                                                                                                                        20-100

                                                                                                                                        workers

                                                                                                                                        5-7 GB

                                                                                                                                        55K files

                                                                                                                                        1800 hours

                                                                                                                                        20-100

                                                                                                                                        workers

                                                                                                                                        lt10 GB

                                                                                                                                        ~1K files

                                                                                                                                        1800 hours

                                                                                                                                        20-100

                                                                                                                                        workers

                                                                                                                                        $420 cpu

                                                                                                                                        $60 download

                                                                                                                                        $216 cpu

                                                                                                                                        $1 download

                                                                                                                                        $6 storage

                                                                                                                                        $216 cpu

                                                                                                                                        $2 download

                                                                                                                                        $9 storage

                                                                                                                                        AzureMODIS

                                                                                                                                        Service Web Role Portal

                                                                                                                                        Total $1420

                                                                                                                                        bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                                                                        potential to be important to both large and small scale science problems

                                                                                                                                        bull Equally import they can increase participation in research providing needed

                                                                                                                                        resources to userscommunities without ready access

                                                                                                                                        bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                                                                        support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                                                                        latency applications do not perform optimally on clouds today

                                                                                                                                        bull Provide valuable fault tolerance and scalability abstractions

                                                                                                                                        bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                                                                        bull Clouds services to support research provide considerable leverage for both

                                                                                                                                        individual researchers and entire communities of researchers

                                                                                                                                        • Day 2 - Azure as a PaaS
                                                                                                                                        • Day 2 - Applications

                                                                                                                                          Scaling Appropriately

                                                                                                                                          bull Monitor your application and make sure yoursquore scaled appropriately (not over-scaled)

                                                                                                                                          bull Spinning VMs up and down automatically is good at large scale

                                                                                                                                          bull Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running

                                                                                                                                          bull Being too aggressive in spinning down VMs can result in poor user experience

                                                                                                                                          bull Trade-off between risk of failurepoor user experience due to not having excess capacity and the costs of having idling VMs

                                                                                                                                          Performance Cost

                                                                                                                                          Storage Costs

                                                                                                                                          bull Understand an applicationrsquos storage profile and how storage billing works

                                                                                                                                          bull Make service choices based on your app profile

                                                                                                                                          bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                                                                                                                                          bull Service choice can make a big cost difference based on your app profile

                                                                                                                                          bull Caching and compressing They help a lot with storage costs

                                                                                                                                          Saving Bandwidth Costs

                                                                                                                                          Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                                                                                                                          Sending fewer things over the wire often means getting fewer things from storage

                                                                                                                                          Saving bandwidth costs often lead to savings in other places

                                                                                                                                          Sending fewer things means your VM has time to do other tasks

                                                                                                                                          All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                                                                                                                          Compressing Content

                                                                                                                                          1 Gzip all output content

                                                                                                                                          bull All modern browsers can decompress on the fly

                                                                                                                                          bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                                                                                                                          2Tradeoff compute costs for storage size

                                                                                                                                          3Minimize image sizes

                                                                                                                                          bull Use Portable Network Graphics (PNGs)

                                                                                                                                          bull Crush your PNGs

                                                                                                                                          bull Strip needless metadata

                                                                                                                                          bull Make all PNGs palette PNGs

                                                                                                                                          Uncompressed

                                                                                                                                          Content

                                                                                                                                          Compressed

                                                                                                                                          Content

                                                                                                                                          Gzip

                                                                                                                                          Minify JavaScript

                                                                                                                                          Minify CCS

                                                                                                                                          Minify Images

                                                                                                                                          Best Practices Summary

                                                                                                                                          Doing lsquolessrsquo is the key to saving costs

                                                                                                                                          Measure everything

                                                                                                                                          Know your application profile in and out

                                                                                                                                          Research Examples in the Cloud

                                                                                                                                          hellipon another set of slides

                                                                                                                                          Map Reduce on Azure

                                                                                                                                          bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                                                                                                                          bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                                                                                                                          76

                                                                                                                                          Project Daytona - Map Reduce on Azure

                                                                                                                                          httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                                                                                                                          77

                                                                                                                                          Questions and Discussionhellip

                                                                                                                                          Thank you for hosting me at the Summer School

                                                                                                                                          BLAST (Basic Local Alignment Search Tool)

                                                                                                                                          bull The most important software in bioinformatics

                                                                                                                                          bull Identify similarity between bio-sequences

                                                                                                                                          Computationally intensive

                                                                                                                                          bull Large number of pairwise alignment operations

                                                                                                                                          bull A BLAST running can take 700 ~ 1000 CPU hours

                                                                                                                                          bull Sequence databases growing exponentially

                                                                                                                                          bull GenBank doubled in size in about 15 months

                                                                                                                                          It is easy to parallelize BLAST

                                                                                                                                          bull Segment the input

                                                                                                                                          bull Segment processing (querying) is pleasingly parallel

                                                                                                                                          bull Segment the database (eg mpiBLAST)

                                                                                                                                          bull Needs special result reduction processing

                                                                                                                                          Large volume data

                                                                                                                                          bull A normal Blast database can be as large as 10GB

                                                                                                                                          bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                                                                                                          bull The output of BLAST is usually 10-100x larger than the input

                                                                                                                                          bull Parallel BLAST engine on Azure

                                                                                                                                          bull Query-segmentation data-parallel pattern bull split the input sequences

                                                                                                                                          bull query partitions in parallel

                                                                                                                                          bull merge results together when done

                                                                                                                                          bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                                                                                                          bull With three special considerations bull Batch job management

                                                                                                                                          bull Task parallelism on an elastic Cloud

                                                                                                                                          Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                                                                                                          on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                                                                                                          A simple SplitJoin pattern

                                                                                                                                          Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                                                                                                          bull 1248 for small middle large and extra large instance size

                                                                                                                                          Task granularity bull Large partition load imbalance

                                                                                                                                          bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                                                                                                          bull Data transferring overhead

                                                                                                                                          Best Practice test runs to profiling and set size to mitigate the overhead

                                                                                                                                          Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                                                                                                          bull too small repeated computation

                                                                                                                                          bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                                                                                                          bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                                                                                                          bull Watch out for the 2-hour maximum limitation

                                                                                                                                          BLAST task

                                                                                                                                          Splitting task

                                                                                                                                          BLAST task

                                                                                                                                          BLAST task

                                                                                                                                          BLAST task

                                                                                                                                          hellip

                                                                                                                                          Merging Task

                                                                                                                                          Task size vs Performance

                                                                                                                                          bull Benefit of the warm cache effect

                                                                                                                                          bull 100 sequences per partition is the best choice

                                                                                                                                          Instance size vs Performance

                                                                                                                                          bull Super-linear speedup with larger size worker instances

                                                                                                                                          bull Primarily due to the memory capability

                                                                                                                                          Task SizeInstance Size vs Cost

                                                                                                                                          bull Extra-large instance generated the best and the most economical throughput

                                                                                                                                          bull Fully utilize the resource

                                                                                                                                          Web

                                                                                                                                          Portal

                                                                                                                                          Web

                                                                                                                                          Service

                                                                                                                                          Job registration

                                                                                                                                          Job Scheduler

                                                                                                                                          Worker

                                                                                                                                          Worker

                                                                                                                                          Worker

                                                                                                                                          Global

                                                                                                                                          dispatch

                                                                                                                                          queue

                                                                                                                                          Web Role

                                                                                                                                          Azure Table

                                                                                                                                          Job Management Role

                                                                                                                                          Azure Blob

                                                                                                                                          Database

                                                                                                                                          updating Role

                                                                                                                                          hellip

                                                                                                                                          Scaling Engine

                                                                                                                                          Blast databases

                                                                                                                                          temporary data etc)

                                                                                                                                          Job Registry NCBI databases

                                                                                                                                          BLAST task

                                                                                                                                          Splitting task

                                                                                                                                          BLAST task

                                                                                                                                          BLAST task

                                                                                                                                          BLAST task

                                                                                                                                          hellip

                                                                                                                                          Merging Task

                                                                                                                                          ASPNET program hosted by a web role instance bull Submit jobs

                                                                                                                                          bull Track jobrsquos status and logs

                                                                                                                                          AuthenticationAuthorization based on Live ID

                                                                                                                                          The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                                                                                                          Web Portal

                                                                                                                                          Web Service

                                                                                                                                          Job registration

                                                                                                                                          Job Scheduler

                                                                                                                                          Job Portal

                                                                                                                                          Scaling Engine

                                                                                                                                          Job Registry

                                                                                                                                          R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                                                                                                          Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                                                                                                          bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                                                                                                          AzureBLAST significantly saved computing timehellip

                                                                                                                                          Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                                                                                                          ldquoAll against Allrdquo query bull The database is also the input query

                                                                                                                                          bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                                                                                                          bull Theoretically 100 billion sequence comparisons

                                                                                                                                          Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                                                                                                          bull Would require 3216731 minutes (61 years) on one desktop

                                                                                                                                          One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                                                                                                          bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                                                                                          bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                                                                                          bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                                                                                          bull Each segment consists of smaller partitions

                                                                                                                                          bull When load imbalances redistribute the load manually

                                                                                                                                          5

                                                                                                                                          0

                                                                                                                                          62 6

                                                                                                                                          2 6

                                                                                                                                          2 6

                                                                                                                                          2 6

                                                                                                                                          2 5

                                                                                                                                          0 62

                                                                                                                                          bull Total size of the output result is ~230GB

                                                                                                                                          bull The number of total hits is 1764579487

                                                                                                                                          bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                                                                                          bull But based our estimates real working instance time should be 6~8 day

                                                                                                                                          bull Look into log data to analyze what took placehellip

                                                                                                                                          5

                                                                                                                                          0

                                                                                                                                          62 6

                                                                                                                                          2 6

                                                                                                                                          2 6

                                                                                                                                          2 6

                                                                                                                                          2 5

                                                                                                                                          0 62

                                                                                                                                          A normal log record should be

                                                                                                                                          Otherwise something is wrong (eg task failed to complete)

                                                                                                                                          3312010 614 RD00155D3611B0 Executing the task 251523

                                                                                                                                          3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                                                                                          3312010 625 RD00155D3611B0 Executing the task 251553

                                                                                                                                          3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                                                                                          3312010 644 RD00155D3611B0 Executing the task 251600

                                                                                                                                          3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                                                                                          3312010 822 RD00155D3611B0 Executing the task 251774

                                                                                                                                          3312010 950 RD00155D3611B0 Executing the task 251895

                                                                                                                                          3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                                                                                          North Europe Data Center totally 34256 tasks processed

                                                                                                                                          All 62 compute nodes lost tasks

                                                                                                                                          and then came back in a group

                                                                                                                                          This is an Update domain

                                                                                                                                          ~30 mins

                                                                                                                                          ~ 6 nodes in one group

                                                                                                                                          35 Nodes experience blob

                                                                                                                                          writing failure at same

                                                                                                                                          time

                                                                                                                                          West Europe Datacenter 30976 tasks are completed and job was killed

                                                                                                                                          A reasonable guess the

                                                                                                                                          Fault Domain is working

                                                                                                                                          MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                                                                                          You never miss the water till the well has run dry Irish Proverb

                                                                                                                                          ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                                                                                          Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                                                                                          λv = Latent heat of vaporization (Jg)

                                                                                                                                          Rn = Net radiation (W m-2)

                                                                                                                                          cp = Specific heat capacity of air (J kg-1 K-1)

                                                                                                                                          ρa = dry air density (kg m-3)

                                                                                                                                          δq = vapor pressure deficit (Pa)

                                                                                                                                          ga = Conductivity of air (inverse of ra) (m s-1)

                                                                                                                                          gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                                                                                          γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                                                                                          Estimating resistanceconductivity across a

                                                                                                                                          catchment can be tricky

                                                                                                                                          bull Lots of inputs big data reduction

                                                                                                                                          bull Some of the inputs are not so simple

                                                                                                                                          119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                                                                                          (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                                                                                          Penman-Monteith (1964)

                                                                                                                                          Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                                                                                          NASA MODIS

                                                                                                                                          imagery source

                                                                                                                                          archives

                                                                                                                                          5 TB (600K files)

                                                                                                                                          FLUXNET curated

                                                                                                                                          sensor dataset

                                                                                                                                          (30GB 960 files)

                                                                                                                                          FLUXNET curated

                                                                                                                                          field dataset

                                                                                                                                          2 KB (1 file)

                                                                                                                                          NCEPNCAR

                                                                                                                                          ~100MB

                                                                                                                                          (4K files)

                                                                                                                                          Vegetative

                                                                                                                                          clumping

                                                                                                                                          ~5MB (1file)

                                                                                                                                          Climate

                                                                                                                                          classification

                                                                                                                                          ~1MB (1file)

                                                                                                                                          20 US year = 1 global year

                                                                                                                                          Data collection (map) stage

                                                                                                                                          bull Downloads requested input tiles

                                                                                                                                          from NASA ftp sites

                                                                                                                                          bull Includes geospatial lookup for

                                                                                                                                          non-sinusoidal tiles that will

                                                                                                                                          contribute to a reprojected

                                                                                                                                          sinusoidal tile

                                                                                                                                          Reprojection (map) stage

                                                                                                                                          bull Converts source tile(s) to

                                                                                                                                          intermediate result sinusoidal tiles

                                                                                                                                          bull Simple nearest neighbor or spline

                                                                                                                                          algorithms

                                                                                                                                          Derivation reduction stage

                                                                                                                                          bull First stage visible to scientist

                                                                                                                                          bull Computes ET in our initial use

                                                                                                                                          Analysis reduction stage

                                                                                                                                          bull Optional second stage visible to

                                                                                                                                          scientist

                                                                                                                                          bull Enables production of science

                                                                                                                                          analysis artifacts such as maps

                                                                                                                                          tables virtual sensors

                                                                                                                                          Reduction 1

                                                                                                                                          Queue

                                                                                                                                          Source

                                                                                                                                          Metadata

                                                                                                                                          AzureMODIS

                                                                                                                                          Service Web Role Portal

                                                                                                                                          Request

                                                                                                                                          Queue

                                                                                                                                          Scientific

                                                                                                                                          Results

                                                                                                                                          Download

                                                                                                                                          Data Collection Stage

                                                                                                                                          Source Imagery Download Sites

                                                                                                                                          Reprojection

                                                                                                                                          Queue

                                                                                                                                          Reduction 2

                                                                                                                                          Queue

                                                                                                                                          Download

                                                                                                                                          Queue

                                                                                                                                          Scientists

                                                                                                                                          Science results

                                                                                                                                          Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                          httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                                                                          bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                                                                          bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                                                                          bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                                                                          recoverable units of work

                                                                                                                                          bull Execution status of all jobs and tasks persisted in Tables

                                                                                                                                          ltPipelineStagegt

                                                                                                                                          Request

                                                                                                                                          hellip ltPipelineStagegtJobStatus

                                                                                                                                          Persist ltPipelineStagegtJob Queue

                                                                                                                                          MODISAzure Service

                                                                                                                                          (Web Role)

                                                                                                                                          Service Monitor

                                                                                                                                          (Worker Role)

                                                                                                                                          Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                          hellip

                                                                                                                                          Dispatch

                                                                                                                                          ltPipelineStagegtTask Queue

                                                                                                                                          All work actually done by a Worker Role

                                                                                                                                          bull Sandboxes science or other executable

                                                                                                                                          bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                                                                          Service Monitor

                                                                                                                                          (Worker Role)

                                                                                                                                          Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                          GenericWorker

                                                                                                                                          (Worker Role)

                                                                                                                                          hellip

                                                                                                                                          hellip

                                                                                                                                          Dispatch

                                                                                                                                          ltPipelineStagegtTask Queue

                                                                                                                                          hellip

                                                                                                                                          ltInputgtData Storage

                                                                                                                                          bull Dequeues tasks created by the Service Monitor

                                                                                                                                          bull Retries failed tasks 3 times

                                                                                                                                          bull Maintains all task status

                                                                                                                                          Reprojection Request

                                                                                                                                          hellip

                                                                                                                                          Service Monitor

                                                                                                                                          (Worker Role)

                                                                                                                                          ReprojectionJobStatus Persist

                                                                                                                                          Parse amp Persist ReprojectionTaskStatus

                                                                                                                                          GenericWorker

                                                                                                                                          (Worker Role)

                                                                                                                                          hellip

                                                                                                                                          Job Queue

                                                                                                                                          hellip

                                                                                                                                          Dispatch

                                                                                                                                          Task Queue

                                                                                                                                          Points to

                                                                                                                                          hellip

                                                                                                                                          ScanTimeList

                                                                                                                                          SwathGranuleMeta

                                                                                                                                          Reprojection Data

                                                                                                                                          Storage

                                                                                                                                          Each entity specifies a

                                                                                                                                          single reprojection job

                                                                                                                                          request

                                                                                                                                          Each entity specifies a

                                                                                                                                          single reprojection task (ie

                                                                                                                                          a single tile)

                                                                                                                                          Query this table to get

                                                                                                                                          geo-metadata (eg

                                                                                                                                          boundaries) for each swath

                                                                                                                                          tile

                                                                                                                                          Query this table to get the

                                                                                                                                          list of satellite scan times

                                                                                                                                          that cover a target tile

                                                                                                                                          Swath Source

                                                                                                                                          Data Storage

                                                                                                                                          bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                                                                          bull Storage costs driven by data scale and 6 month project duration

                                                                                                                                          bull Small with respect to the people costs even at graduate student rates

                                                                                                                                          Reduction 1 Queue

                                                                                                                                          Source Metadata

                                                                                                                                          Request Queue

                                                                                                                                          Scientific Results Download

                                                                                                                                          Data Collection Stage

                                                                                                                                          Source Imagery Download Sites

                                                                                                                                          Reprojection Queue

                                                                                                                                          Reduction 2 Queue

                                                                                                                                          Download

                                                                                                                                          Queue

                                                                                                                                          Scientists

                                                                                                                                          Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                          400-500 GB

                                                                                                                                          60K files

                                                                                                                                          10 MBsec

                                                                                                                                          11 hours

                                                                                                                                          lt10 workers

                                                                                                                                          $50 upload

                                                                                                                                          $450 storage

                                                                                                                                          400 GB

                                                                                                                                          45K files

                                                                                                                                          3500 hours

                                                                                                                                          20-100

                                                                                                                                          workers

                                                                                                                                          5-7 GB

                                                                                                                                          55K files

                                                                                                                                          1800 hours

                                                                                                                                          20-100

                                                                                                                                          workers

                                                                                                                                          lt10 GB

                                                                                                                                          ~1K files

                                                                                                                                          1800 hours

                                                                                                                                          20-100

                                                                                                                                          workers

                                                                                                                                          $420 cpu

                                                                                                                                          $60 download

                                                                                                                                          $216 cpu

                                                                                                                                          $1 download

                                                                                                                                          $6 storage

                                                                                                                                          $216 cpu

                                                                                                                                          $2 download

                                                                                                                                          $9 storage

                                                                                                                                          AzureMODIS

                                                                                                                                          Service Web Role Portal

                                                                                                                                          Total $1420

                                                                                                                                          bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                                                                          potential to be important to both large and small scale science problems

                                                                                                                                          bull Equally import they can increase participation in research providing needed

                                                                                                                                          resources to userscommunities without ready access

                                                                                                                                          bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                                                                          support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                                                                          latency applications do not perform optimally on clouds today

                                                                                                                                          bull Provide valuable fault tolerance and scalability abstractions

                                                                                                                                          bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                                                                          bull Clouds services to support research provide considerable leverage for both

                                                                                                                                          individual researchers and entire communities of researchers

                                                                                                                                          • Day 2 - Azure as a PaaS
                                                                                                                                          • Day 2 - Applications

                                                                                                                                            Storage Costs

                                                                                                                                            bull Understand an applicationrsquos storage profile and how storage billing works

                                                                                                                                            bull Make service choices based on your app profile

                                                                                                                                            bull Eg SQL Azure has a flat fee while Windows Azure Tables charges per transaction

                                                                                                                                            bull Service choice can make a big cost difference based on your app profile

                                                                                                                                            bull Caching and compressing They help a lot with storage costs

                                                                                                                                            Saving Bandwidth Costs

                                                                                                                                            Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                                                                                                                            Sending fewer things over the wire often means getting fewer things from storage

                                                                                                                                            Saving bandwidth costs often lead to savings in other places

                                                                                                                                            Sending fewer things means your VM has time to do other tasks

                                                                                                                                            All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                                                                                                                            Compressing Content

                                                                                                                                            1 Gzip all output content

                                                                                                                                            bull All modern browsers can decompress on the fly

                                                                                                                                            bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                                                                                                                            2Tradeoff compute costs for storage size

                                                                                                                                            3Minimize image sizes

                                                                                                                                            bull Use Portable Network Graphics (PNGs)

                                                                                                                                            bull Crush your PNGs

                                                                                                                                            bull Strip needless metadata

                                                                                                                                            bull Make all PNGs palette PNGs

                                                                                                                                            Uncompressed

                                                                                                                                            Content

                                                                                                                                            Compressed

                                                                                                                                            Content

                                                                                                                                            Gzip

                                                                                                                                            Minify JavaScript

                                                                                                                                            Minify CCS

                                                                                                                                            Minify Images

                                                                                                                                            Best Practices Summary

                                                                                                                                            Doing lsquolessrsquo is the key to saving costs

                                                                                                                                            Measure everything

                                                                                                                                            Know your application profile in and out

                                                                                                                                            Research Examples in the Cloud

                                                                                                                                            hellipon another set of slides

                                                                                                                                            Map Reduce on Azure

                                                                                                                                            bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                                                                                                                            bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                                                                                                                            76

                                                                                                                                            Project Daytona - Map Reduce on Azure

                                                                                                                                            httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                                                                                                                            77

                                                                                                                                            Questions and Discussionhellip

                                                                                                                                            Thank you for hosting me at the Summer School

                                                                                                                                            BLAST (Basic Local Alignment Search Tool)

                                                                                                                                            bull The most important software in bioinformatics

                                                                                                                                            bull Identify similarity between bio-sequences

                                                                                                                                            Computationally intensive

                                                                                                                                            bull Large number of pairwise alignment operations

                                                                                                                                            bull A BLAST running can take 700 ~ 1000 CPU hours

                                                                                                                                            bull Sequence databases growing exponentially

                                                                                                                                            bull GenBank doubled in size in about 15 months

                                                                                                                                            It is easy to parallelize BLAST

                                                                                                                                            bull Segment the input

                                                                                                                                            bull Segment processing (querying) is pleasingly parallel

                                                                                                                                            bull Segment the database (eg mpiBLAST)

                                                                                                                                            bull Needs special result reduction processing

                                                                                                                                            Large volume data

                                                                                                                                            bull A normal Blast database can be as large as 10GB

                                                                                                                                            bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                                                                                                            bull The output of BLAST is usually 10-100x larger than the input

                                                                                                                                            bull Parallel BLAST engine on Azure

                                                                                                                                            bull Query-segmentation data-parallel pattern bull split the input sequences

                                                                                                                                            bull query partitions in parallel

                                                                                                                                            bull merge results together when done

                                                                                                                                            bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                                                                                                            bull With three special considerations bull Batch job management

                                                                                                                                            bull Task parallelism on an elastic Cloud

                                                                                                                                            Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                                                                                                            on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                                                                                                            A simple SplitJoin pattern

                                                                                                                                            Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                                                                                                            bull 1248 for small middle large and extra large instance size

                                                                                                                                            Task granularity bull Large partition load imbalance

                                                                                                                                            bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                                                                                                            bull Data transferring overhead

                                                                                                                                            Best Practice test runs to profiling and set size to mitigate the overhead

                                                                                                                                            Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                                                                                                            bull too small repeated computation

                                                                                                                                            bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                                                                                                            bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                                                                                                            bull Watch out for the 2-hour maximum limitation

                                                                                                                                            BLAST task

                                                                                                                                            Splitting task

                                                                                                                                            BLAST task

                                                                                                                                            BLAST task

                                                                                                                                            BLAST task

                                                                                                                                            hellip

                                                                                                                                            Merging Task

                                                                                                                                            Task size vs Performance

                                                                                                                                            bull Benefit of the warm cache effect

                                                                                                                                            bull 100 sequences per partition is the best choice

                                                                                                                                            Instance size vs Performance

                                                                                                                                            bull Super-linear speedup with larger size worker instances

                                                                                                                                            bull Primarily due to the memory capability

                                                                                                                                            Task SizeInstance Size vs Cost

                                                                                                                                            bull Extra-large instance generated the best and the most economical throughput

                                                                                                                                            bull Fully utilize the resource

                                                                                                                                            Web

                                                                                                                                            Portal

                                                                                                                                            Web

                                                                                                                                            Service

                                                                                                                                            Job registration

                                                                                                                                            Job Scheduler

                                                                                                                                            Worker

                                                                                                                                            Worker

                                                                                                                                            Worker

                                                                                                                                            Global

                                                                                                                                            dispatch

                                                                                                                                            queue

                                                                                                                                            Web Role

                                                                                                                                            Azure Table

                                                                                                                                            Job Management Role

                                                                                                                                            Azure Blob

                                                                                                                                            Database

                                                                                                                                            updating Role

                                                                                                                                            hellip

                                                                                                                                            Scaling Engine

                                                                                                                                            Blast databases

                                                                                                                                            temporary data etc)

                                                                                                                                            Job Registry NCBI databases

                                                                                                                                            BLAST task

                                                                                                                                            Splitting task

                                                                                                                                            BLAST task

                                                                                                                                            BLAST task

                                                                                                                                            BLAST task

                                                                                                                                            hellip

                                                                                                                                            Merging Task

                                                                                                                                            ASPNET program hosted by a web role instance bull Submit jobs

                                                                                                                                            bull Track jobrsquos status and logs

                                                                                                                                            AuthenticationAuthorization based on Live ID

                                                                                                                                            The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                                                                                                            Web Portal

                                                                                                                                            Web Service

                                                                                                                                            Job registration

                                                                                                                                            Job Scheduler

                                                                                                                                            Job Portal

                                                                                                                                            Scaling Engine

                                                                                                                                            Job Registry

                                                                                                                                            R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                                                                                                            Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                                                                                                            bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                                                                                                            AzureBLAST significantly saved computing timehellip

                                                                                                                                            Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                                                                                                            ldquoAll against Allrdquo query bull The database is also the input query

                                                                                                                                            bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                                                                                                            bull Theoretically 100 billion sequence comparisons

                                                                                                                                            Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                                                                                                            bull Would require 3216731 minutes (61 years) on one desktop

                                                                                                                                            One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                                                                                                            bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                                                                                            bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                                                                                            bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                                                                                            bull Each segment consists of smaller partitions

                                                                                                                                            bull When load imbalances redistribute the load manually

                                                                                                                                            5

                                                                                                                                            0

                                                                                                                                            62 6

                                                                                                                                            2 6

                                                                                                                                            2 6

                                                                                                                                            2 6

                                                                                                                                            2 5

                                                                                                                                            0 62

                                                                                                                                            bull Total size of the output result is ~230GB

                                                                                                                                            bull The number of total hits is 1764579487

                                                                                                                                            bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                                                                                            bull But based our estimates real working instance time should be 6~8 day

                                                                                                                                            bull Look into log data to analyze what took placehellip

                                                                                                                                            5

                                                                                                                                            0

                                                                                                                                            62 6

                                                                                                                                            2 6

                                                                                                                                            2 6

                                                                                                                                            2 6

                                                                                                                                            2 5

                                                                                                                                            0 62

                                                                                                                                            A normal log record should be

                                                                                                                                            Otherwise something is wrong (eg task failed to complete)

                                                                                                                                            3312010 614 RD00155D3611B0 Executing the task 251523

                                                                                                                                            3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                                                                                            3312010 625 RD00155D3611B0 Executing the task 251553

                                                                                                                                            3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                                                                                            3312010 644 RD00155D3611B0 Executing the task 251600

                                                                                                                                            3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                                                                                            3312010 822 RD00155D3611B0 Executing the task 251774

                                                                                                                                            3312010 950 RD00155D3611B0 Executing the task 251895

                                                                                                                                            3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                                                                                            North Europe Data Center totally 34256 tasks processed

                                                                                                                                            All 62 compute nodes lost tasks

                                                                                                                                            and then came back in a group

                                                                                                                                            This is an Update domain

                                                                                                                                            ~30 mins

                                                                                                                                            ~ 6 nodes in one group

                                                                                                                                            35 Nodes experience blob

                                                                                                                                            writing failure at same

                                                                                                                                            time

                                                                                                                                            West Europe Datacenter 30976 tasks are completed and job was killed

                                                                                                                                            A reasonable guess the

                                                                                                                                            Fault Domain is working

                                                                                                                                            MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                                                                                            You never miss the water till the well has run dry Irish Proverb

                                                                                                                                            ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                                                                                            Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                                                                                            λv = Latent heat of vaporization (Jg)

                                                                                                                                            Rn = Net radiation (W m-2)

                                                                                                                                            cp = Specific heat capacity of air (J kg-1 K-1)

                                                                                                                                            ρa = dry air density (kg m-3)

                                                                                                                                            δq = vapor pressure deficit (Pa)

                                                                                                                                            ga = Conductivity of air (inverse of ra) (m s-1)

                                                                                                                                            gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                                                                                            γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                                                                                            Estimating resistanceconductivity across a

                                                                                                                                            catchment can be tricky

                                                                                                                                            bull Lots of inputs big data reduction

                                                                                                                                            bull Some of the inputs are not so simple

                                                                                                                                            119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                                                                                            (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                                                                                            Penman-Monteith (1964)

                                                                                                                                            Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                                                                                            NASA MODIS

                                                                                                                                            imagery source

                                                                                                                                            archives

                                                                                                                                            5 TB (600K files)

                                                                                                                                            FLUXNET curated

                                                                                                                                            sensor dataset

                                                                                                                                            (30GB 960 files)

                                                                                                                                            FLUXNET curated

                                                                                                                                            field dataset

                                                                                                                                            2 KB (1 file)

                                                                                                                                            NCEPNCAR

                                                                                                                                            ~100MB

                                                                                                                                            (4K files)

                                                                                                                                            Vegetative

                                                                                                                                            clumping

                                                                                                                                            ~5MB (1file)

                                                                                                                                            Climate

                                                                                                                                            classification

                                                                                                                                            ~1MB (1file)

                                                                                                                                            20 US year = 1 global year

                                                                                                                                            Data collection (map) stage

                                                                                                                                            bull Downloads requested input tiles

                                                                                                                                            from NASA ftp sites

                                                                                                                                            bull Includes geospatial lookup for

                                                                                                                                            non-sinusoidal tiles that will

                                                                                                                                            contribute to a reprojected

                                                                                                                                            sinusoidal tile

                                                                                                                                            Reprojection (map) stage

                                                                                                                                            bull Converts source tile(s) to

                                                                                                                                            intermediate result sinusoidal tiles

                                                                                                                                            bull Simple nearest neighbor or spline

                                                                                                                                            algorithms

                                                                                                                                            Derivation reduction stage

                                                                                                                                            bull First stage visible to scientist

                                                                                                                                            bull Computes ET in our initial use

                                                                                                                                            Analysis reduction stage

                                                                                                                                            bull Optional second stage visible to

                                                                                                                                            scientist

                                                                                                                                            bull Enables production of science

                                                                                                                                            analysis artifacts such as maps

                                                                                                                                            tables virtual sensors

                                                                                                                                            Reduction 1

                                                                                                                                            Queue

                                                                                                                                            Source

                                                                                                                                            Metadata

                                                                                                                                            AzureMODIS

                                                                                                                                            Service Web Role Portal

                                                                                                                                            Request

                                                                                                                                            Queue

                                                                                                                                            Scientific

                                                                                                                                            Results

                                                                                                                                            Download

                                                                                                                                            Data Collection Stage

                                                                                                                                            Source Imagery Download Sites

                                                                                                                                            Reprojection

                                                                                                                                            Queue

                                                                                                                                            Reduction 2

                                                                                                                                            Queue

                                                                                                                                            Download

                                                                                                                                            Queue

                                                                                                                                            Scientists

                                                                                                                                            Science results

                                                                                                                                            Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                            httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                                                                            bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                                                                            bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                                                                            bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                                                                            recoverable units of work

                                                                                                                                            bull Execution status of all jobs and tasks persisted in Tables

                                                                                                                                            ltPipelineStagegt

                                                                                                                                            Request

                                                                                                                                            hellip ltPipelineStagegtJobStatus

                                                                                                                                            Persist ltPipelineStagegtJob Queue

                                                                                                                                            MODISAzure Service

                                                                                                                                            (Web Role)

                                                                                                                                            Service Monitor

                                                                                                                                            (Worker Role)

                                                                                                                                            Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                            hellip

                                                                                                                                            Dispatch

                                                                                                                                            ltPipelineStagegtTask Queue

                                                                                                                                            All work actually done by a Worker Role

                                                                                                                                            bull Sandboxes science or other executable

                                                                                                                                            bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                                                                            Service Monitor

                                                                                                                                            (Worker Role)

                                                                                                                                            Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                            GenericWorker

                                                                                                                                            (Worker Role)

                                                                                                                                            hellip

                                                                                                                                            hellip

                                                                                                                                            Dispatch

                                                                                                                                            ltPipelineStagegtTask Queue

                                                                                                                                            hellip

                                                                                                                                            ltInputgtData Storage

                                                                                                                                            bull Dequeues tasks created by the Service Monitor

                                                                                                                                            bull Retries failed tasks 3 times

                                                                                                                                            bull Maintains all task status

                                                                                                                                            Reprojection Request

                                                                                                                                            hellip

                                                                                                                                            Service Monitor

                                                                                                                                            (Worker Role)

                                                                                                                                            ReprojectionJobStatus Persist

                                                                                                                                            Parse amp Persist ReprojectionTaskStatus

                                                                                                                                            GenericWorker

                                                                                                                                            (Worker Role)

                                                                                                                                            hellip

                                                                                                                                            Job Queue

                                                                                                                                            hellip

                                                                                                                                            Dispatch

                                                                                                                                            Task Queue

                                                                                                                                            Points to

                                                                                                                                            hellip

                                                                                                                                            ScanTimeList

                                                                                                                                            SwathGranuleMeta

                                                                                                                                            Reprojection Data

                                                                                                                                            Storage

                                                                                                                                            Each entity specifies a

                                                                                                                                            single reprojection job

                                                                                                                                            request

                                                                                                                                            Each entity specifies a

                                                                                                                                            single reprojection task (ie

                                                                                                                                            a single tile)

                                                                                                                                            Query this table to get

                                                                                                                                            geo-metadata (eg

                                                                                                                                            boundaries) for each swath

                                                                                                                                            tile

                                                                                                                                            Query this table to get the

                                                                                                                                            list of satellite scan times

                                                                                                                                            that cover a target tile

                                                                                                                                            Swath Source

                                                                                                                                            Data Storage

                                                                                                                                            bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                                                                            bull Storage costs driven by data scale and 6 month project duration

                                                                                                                                            bull Small with respect to the people costs even at graduate student rates

                                                                                                                                            Reduction 1 Queue

                                                                                                                                            Source Metadata

                                                                                                                                            Request Queue

                                                                                                                                            Scientific Results Download

                                                                                                                                            Data Collection Stage

                                                                                                                                            Source Imagery Download Sites

                                                                                                                                            Reprojection Queue

                                                                                                                                            Reduction 2 Queue

                                                                                                                                            Download

                                                                                                                                            Queue

                                                                                                                                            Scientists

                                                                                                                                            Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                            400-500 GB

                                                                                                                                            60K files

                                                                                                                                            10 MBsec

                                                                                                                                            11 hours

                                                                                                                                            lt10 workers

                                                                                                                                            $50 upload

                                                                                                                                            $450 storage

                                                                                                                                            400 GB

                                                                                                                                            45K files

                                                                                                                                            3500 hours

                                                                                                                                            20-100

                                                                                                                                            workers

                                                                                                                                            5-7 GB

                                                                                                                                            55K files

                                                                                                                                            1800 hours

                                                                                                                                            20-100

                                                                                                                                            workers

                                                                                                                                            lt10 GB

                                                                                                                                            ~1K files

                                                                                                                                            1800 hours

                                                                                                                                            20-100

                                                                                                                                            workers

                                                                                                                                            $420 cpu

                                                                                                                                            $60 download

                                                                                                                                            $216 cpu

                                                                                                                                            $1 download

                                                                                                                                            $6 storage

                                                                                                                                            $216 cpu

                                                                                                                                            $2 download

                                                                                                                                            $9 storage

                                                                                                                                            AzureMODIS

                                                                                                                                            Service Web Role Portal

                                                                                                                                            Total $1420

                                                                                                                                            bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                                                                            potential to be important to both large and small scale science problems

                                                                                                                                            bull Equally import they can increase participation in research providing needed

                                                                                                                                            resources to userscommunities without ready access

                                                                                                                                            bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                                                                            support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                                                                            latency applications do not perform optimally on clouds today

                                                                                                                                            bull Provide valuable fault tolerance and scalability abstractions

                                                                                                                                            bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                                                                            bull Clouds services to support research provide considerable leverage for both

                                                                                                                                            individual researchers and entire communities of researchers

                                                                                                                                            • Day 2 - Azure as a PaaS
                                                                                                                                            • Day 2 - Applications

                                                                                                                                              Saving Bandwidth Costs

                                                                                                                                              Bandwidth costs are a huge part of any popular web apprsquos billing profile

                                                                                                                                              Sending fewer things over the wire often means getting fewer things from storage

                                                                                                                                              Saving bandwidth costs often lead to savings in other places

                                                                                                                                              Sending fewer things means your VM has time to do other tasks

                                                                                                                                              All of these tips have the side benefit of improving your web apprsquos performance and user experience

                                                                                                                                              Compressing Content

                                                                                                                                              1 Gzip all output content

                                                                                                                                              bull All modern browsers can decompress on the fly

                                                                                                                                              bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                                                                                                                              2Tradeoff compute costs for storage size

                                                                                                                                              3Minimize image sizes

                                                                                                                                              bull Use Portable Network Graphics (PNGs)

                                                                                                                                              bull Crush your PNGs

                                                                                                                                              bull Strip needless metadata

                                                                                                                                              bull Make all PNGs palette PNGs

                                                                                                                                              Uncompressed

                                                                                                                                              Content

                                                                                                                                              Compressed

                                                                                                                                              Content

                                                                                                                                              Gzip

                                                                                                                                              Minify JavaScript

                                                                                                                                              Minify CCS

                                                                                                                                              Minify Images

                                                                                                                                              Best Practices Summary

                                                                                                                                              Doing lsquolessrsquo is the key to saving costs

                                                                                                                                              Measure everything

                                                                                                                                              Know your application profile in and out

                                                                                                                                              Research Examples in the Cloud

                                                                                                                                              hellipon another set of slides

                                                                                                                                              Map Reduce on Azure

                                                                                                                                              bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                                                                                                                              bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                                                                                                                              76

                                                                                                                                              Project Daytona - Map Reduce on Azure

                                                                                                                                              httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                                                                                                                              77

                                                                                                                                              Questions and Discussionhellip

                                                                                                                                              Thank you for hosting me at the Summer School

                                                                                                                                              BLAST (Basic Local Alignment Search Tool)

                                                                                                                                              bull The most important software in bioinformatics

                                                                                                                                              bull Identify similarity between bio-sequences

                                                                                                                                              Computationally intensive

                                                                                                                                              bull Large number of pairwise alignment operations

                                                                                                                                              bull A BLAST running can take 700 ~ 1000 CPU hours

                                                                                                                                              bull Sequence databases growing exponentially

                                                                                                                                              bull GenBank doubled in size in about 15 months

                                                                                                                                              It is easy to parallelize BLAST

                                                                                                                                              bull Segment the input

                                                                                                                                              bull Segment processing (querying) is pleasingly parallel

                                                                                                                                              bull Segment the database (eg mpiBLAST)

                                                                                                                                              bull Needs special result reduction processing

                                                                                                                                              Large volume data

                                                                                                                                              bull A normal Blast database can be as large as 10GB

                                                                                                                                              bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                                                                                                              bull The output of BLAST is usually 10-100x larger than the input

                                                                                                                                              bull Parallel BLAST engine on Azure

                                                                                                                                              bull Query-segmentation data-parallel pattern bull split the input sequences

                                                                                                                                              bull query partitions in parallel

                                                                                                                                              bull merge results together when done

                                                                                                                                              bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                                                                                                              bull With three special considerations bull Batch job management

                                                                                                                                              bull Task parallelism on an elastic Cloud

                                                                                                                                              Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                                                                                                              on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                                                                                                              A simple SplitJoin pattern

                                                                                                                                              Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                                                                                                              bull 1248 for small middle large and extra large instance size

                                                                                                                                              Task granularity bull Large partition load imbalance

                                                                                                                                              bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                                                                                                              bull Data transferring overhead

                                                                                                                                              Best Practice test runs to profiling and set size to mitigate the overhead

                                                                                                                                              Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                                                                                                              bull too small repeated computation

                                                                                                                                              bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                                                                                                              bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                                                                                                              bull Watch out for the 2-hour maximum limitation

                                                                                                                                              BLAST task

                                                                                                                                              Splitting task

                                                                                                                                              BLAST task

                                                                                                                                              BLAST task

                                                                                                                                              BLAST task

                                                                                                                                              hellip

                                                                                                                                              Merging Task

                                                                                                                                              Task size vs Performance

                                                                                                                                              bull Benefit of the warm cache effect

                                                                                                                                              bull 100 sequences per partition is the best choice

                                                                                                                                              Instance size vs Performance

                                                                                                                                              bull Super-linear speedup with larger size worker instances

                                                                                                                                              bull Primarily due to the memory capability

                                                                                                                                              Task SizeInstance Size vs Cost

                                                                                                                                              bull Extra-large instance generated the best and the most economical throughput

                                                                                                                                              bull Fully utilize the resource

                                                                                                                                              Web

                                                                                                                                              Portal

                                                                                                                                              Web

                                                                                                                                              Service

                                                                                                                                              Job registration

                                                                                                                                              Job Scheduler

                                                                                                                                              Worker

                                                                                                                                              Worker

                                                                                                                                              Worker

                                                                                                                                              Global

                                                                                                                                              dispatch

                                                                                                                                              queue

                                                                                                                                              Web Role

                                                                                                                                              Azure Table

                                                                                                                                              Job Management Role

                                                                                                                                              Azure Blob

                                                                                                                                              Database

                                                                                                                                              updating Role

                                                                                                                                              hellip

                                                                                                                                              Scaling Engine

                                                                                                                                              Blast databases

                                                                                                                                              temporary data etc)

                                                                                                                                              Job Registry NCBI databases

                                                                                                                                              BLAST task

                                                                                                                                              Splitting task

                                                                                                                                              BLAST task

                                                                                                                                              BLAST task

                                                                                                                                              BLAST task

                                                                                                                                              hellip

                                                                                                                                              Merging Task

                                                                                                                                              ASPNET program hosted by a web role instance bull Submit jobs

                                                                                                                                              bull Track jobrsquos status and logs

                                                                                                                                              AuthenticationAuthorization based on Live ID

                                                                                                                                              The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                                                                                                              Web Portal

                                                                                                                                              Web Service

                                                                                                                                              Job registration

                                                                                                                                              Job Scheduler

                                                                                                                                              Job Portal

                                                                                                                                              Scaling Engine

                                                                                                                                              Job Registry

                                                                                                                                              R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                                                                                                              Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                                                                                                              bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                                                                                                              AzureBLAST significantly saved computing timehellip

                                                                                                                                              Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                                                                                                              ldquoAll against Allrdquo query bull The database is also the input query

                                                                                                                                              bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                                                                                                              bull Theoretically 100 billion sequence comparisons

                                                                                                                                              Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                                                                                                              bull Would require 3216731 minutes (61 years) on one desktop

                                                                                                                                              One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                                                                                                              bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                                                                                              bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                                                                                              bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                                                                                              bull Each segment consists of smaller partitions

                                                                                                                                              bull When load imbalances redistribute the load manually

                                                                                                                                              5

                                                                                                                                              0

                                                                                                                                              62 6

                                                                                                                                              2 6

                                                                                                                                              2 6

                                                                                                                                              2 6

                                                                                                                                              2 5

                                                                                                                                              0 62

                                                                                                                                              bull Total size of the output result is ~230GB

                                                                                                                                              bull The number of total hits is 1764579487

                                                                                                                                              bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                                                                                              bull But based our estimates real working instance time should be 6~8 day

                                                                                                                                              bull Look into log data to analyze what took placehellip

                                                                                                                                              5

                                                                                                                                              0

                                                                                                                                              62 6

                                                                                                                                              2 6

                                                                                                                                              2 6

                                                                                                                                              2 6

                                                                                                                                              2 5

                                                                                                                                              0 62

                                                                                                                                              A normal log record should be

                                                                                                                                              Otherwise something is wrong (eg task failed to complete)

                                                                                                                                              3312010 614 RD00155D3611B0 Executing the task 251523

                                                                                                                                              3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                                                                                              3312010 625 RD00155D3611B0 Executing the task 251553

                                                                                                                                              3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                                                                                              3312010 644 RD00155D3611B0 Executing the task 251600

                                                                                                                                              3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                                                                                              3312010 822 RD00155D3611B0 Executing the task 251774

                                                                                                                                              3312010 950 RD00155D3611B0 Executing the task 251895

                                                                                                                                              3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                                                                                              North Europe Data Center totally 34256 tasks processed

                                                                                                                                              All 62 compute nodes lost tasks

                                                                                                                                              and then came back in a group

                                                                                                                                              This is an Update domain

                                                                                                                                              ~30 mins

                                                                                                                                              ~ 6 nodes in one group

                                                                                                                                              35 Nodes experience blob

                                                                                                                                              writing failure at same

                                                                                                                                              time

                                                                                                                                              West Europe Datacenter 30976 tasks are completed and job was killed

                                                                                                                                              A reasonable guess the

                                                                                                                                              Fault Domain is working

                                                                                                                                              MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                                                                                              You never miss the water till the well has run dry Irish Proverb

                                                                                                                                              ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                                                                                              Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                                                                                              λv = Latent heat of vaporization (Jg)

                                                                                                                                              Rn = Net radiation (W m-2)

                                                                                                                                              cp = Specific heat capacity of air (J kg-1 K-1)

                                                                                                                                              ρa = dry air density (kg m-3)

                                                                                                                                              δq = vapor pressure deficit (Pa)

                                                                                                                                              ga = Conductivity of air (inverse of ra) (m s-1)

                                                                                                                                              gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                                                                                              γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                                                                                              Estimating resistanceconductivity across a

                                                                                                                                              catchment can be tricky

                                                                                                                                              bull Lots of inputs big data reduction

                                                                                                                                              bull Some of the inputs are not so simple

                                                                                                                                              119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                                                                                              (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                                                                                              Penman-Monteith (1964)

                                                                                                                                              Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                                                                                              NASA MODIS

                                                                                                                                              imagery source

                                                                                                                                              archives

                                                                                                                                              5 TB (600K files)

                                                                                                                                              FLUXNET curated

                                                                                                                                              sensor dataset

                                                                                                                                              (30GB 960 files)

                                                                                                                                              FLUXNET curated

                                                                                                                                              field dataset

                                                                                                                                              2 KB (1 file)

                                                                                                                                              NCEPNCAR

                                                                                                                                              ~100MB

                                                                                                                                              (4K files)

                                                                                                                                              Vegetative

                                                                                                                                              clumping

                                                                                                                                              ~5MB (1file)

                                                                                                                                              Climate

                                                                                                                                              classification

                                                                                                                                              ~1MB (1file)

                                                                                                                                              20 US year = 1 global year

                                                                                                                                              Data collection (map) stage

                                                                                                                                              bull Downloads requested input tiles

                                                                                                                                              from NASA ftp sites

                                                                                                                                              bull Includes geospatial lookup for

                                                                                                                                              non-sinusoidal tiles that will

                                                                                                                                              contribute to a reprojected

                                                                                                                                              sinusoidal tile

                                                                                                                                              Reprojection (map) stage

                                                                                                                                              bull Converts source tile(s) to

                                                                                                                                              intermediate result sinusoidal tiles

                                                                                                                                              bull Simple nearest neighbor or spline

                                                                                                                                              algorithms

                                                                                                                                              Derivation reduction stage

                                                                                                                                              bull First stage visible to scientist

                                                                                                                                              bull Computes ET in our initial use

                                                                                                                                              Analysis reduction stage

                                                                                                                                              bull Optional second stage visible to

                                                                                                                                              scientist

                                                                                                                                              bull Enables production of science

                                                                                                                                              analysis artifacts such as maps

                                                                                                                                              tables virtual sensors

                                                                                                                                              Reduction 1

                                                                                                                                              Queue

                                                                                                                                              Source

                                                                                                                                              Metadata

                                                                                                                                              AzureMODIS

                                                                                                                                              Service Web Role Portal

                                                                                                                                              Request

                                                                                                                                              Queue

                                                                                                                                              Scientific

                                                                                                                                              Results

                                                                                                                                              Download

                                                                                                                                              Data Collection Stage

                                                                                                                                              Source Imagery Download Sites

                                                                                                                                              Reprojection

                                                                                                                                              Queue

                                                                                                                                              Reduction 2

                                                                                                                                              Queue

                                                                                                                                              Download

                                                                                                                                              Queue

                                                                                                                                              Scientists

                                                                                                                                              Science results

                                                                                                                                              Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                              httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                                                                              bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                                                                              bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                                                                              bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                                                                              recoverable units of work

                                                                                                                                              bull Execution status of all jobs and tasks persisted in Tables

                                                                                                                                              ltPipelineStagegt

                                                                                                                                              Request

                                                                                                                                              hellip ltPipelineStagegtJobStatus

                                                                                                                                              Persist ltPipelineStagegtJob Queue

                                                                                                                                              MODISAzure Service

                                                                                                                                              (Web Role)

                                                                                                                                              Service Monitor

                                                                                                                                              (Worker Role)

                                                                                                                                              Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                              hellip

                                                                                                                                              Dispatch

                                                                                                                                              ltPipelineStagegtTask Queue

                                                                                                                                              All work actually done by a Worker Role

                                                                                                                                              bull Sandboxes science or other executable

                                                                                                                                              bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                                                                              Service Monitor

                                                                                                                                              (Worker Role)

                                                                                                                                              Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                              GenericWorker

                                                                                                                                              (Worker Role)

                                                                                                                                              hellip

                                                                                                                                              hellip

                                                                                                                                              Dispatch

                                                                                                                                              ltPipelineStagegtTask Queue

                                                                                                                                              hellip

                                                                                                                                              ltInputgtData Storage

                                                                                                                                              bull Dequeues tasks created by the Service Monitor

                                                                                                                                              bull Retries failed tasks 3 times

                                                                                                                                              bull Maintains all task status

                                                                                                                                              Reprojection Request

                                                                                                                                              hellip

                                                                                                                                              Service Monitor

                                                                                                                                              (Worker Role)

                                                                                                                                              ReprojectionJobStatus Persist

                                                                                                                                              Parse amp Persist ReprojectionTaskStatus

                                                                                                                                              GenericWorker

                                                                                                                                              (Worker Role)

                                                                                                                                              hellip

                                                                                                                                              Job Queue

                                                                                                                                              hellip

                                                                                                                                              Dispatch

                                                                                                                                              Task Queue

                                                                                                                                              Points to

                                                                                                                                              hellip

                                                                                                                                              ScanTimeList

                                                                                                                                              SwathGranuleMeta

                                                                                                                                              Reprojection Data

                                                                                                                                              Storage

                                                                                                                                              Each entity specifies a

                                                                                                                                              single reprojection job

                                                                                                                                              request

                                                                                                                                              Each entity specifies a

                                                                                                                                              single reprojection task (ie

                                                                                                                                              a single tile)

                                                                                                                                              Query this table to get

                                                                                                                                              geo-metadata (eg

                                                                                                                                              boundaries) for each swath

                                                                                                                                              tile

                                                                                                                                              Query this table to get the

                                                                                                                                              list of satellite scan times

                                                                                                                                              that cover a target tile

                                                                                                                                              Swath Source

                                                                                                                                              Data Storage

                                                                                                                                              bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                                                                              bull Storage costs driven by data scale and 6 month project duration

                                                                                                                                              bull Small with respect to the people costs even at graduate student rates

                                                                                                                                              Reduction 1 Queue

                                                                                                                                              Source Metadata

                                                                                                                                              Request Queue

                                                                                                                                              Scientific Results Download

                                                                                                                                              Data Collection Stage

                                                                                                                                              Source Imagery Download Sites

                                                                                                                                              Reprojection Queue

                                                                                                                                              Reduction 2 Queue

                                                                                                                                              Download

                                                                                                                                              Queue

                                                                                                                                              Scientists

                                                                                                                                              Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                              400-500 GB

                                                                                                                                              60K files

                                                                                                                                              10 MBsec

                                                                                                                                              11 hours

                                                                                                                                              lt10 workers

                                                                                                                                              $50 upload

                                                                                                                                              $450 storage

                                                                                                                                              400 GB

                                                                                                                                              45K files

                                                                                                                                              3500 hours

                                                                                                                                              20-100

                                                                                                                                              workers

                                                                                                                                              5-7 GB

                                                                                                                                              55K files

                                                                                                                                              1800 hours

                                                                                                                                              20-100

                                                                                                                                              workers

                                                                                                                                              lt10 GB

                                                                                                                                              ~1K files

                                                                                                                                              1800 hours

                                                                                                                                              20-100

                                                                                                                                              workers

                                                                                                                                              $420 cpu

                                                                                                                                              $60 download

                                                                                                                                              $216 cpu

                                                                                                                                              $1 download

                                                                                                                                              $6 storage

                                                                                                                                              $216 cpu

                                                                                                                                              $2 download

                                                                                                                                              $9 storage

                                                                                                                                              AzureMODIS

                                                                                                                                              Service Web Role Portal

                                                                                                                                              Total $1420

                                                                                                                                              bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                                                                              potential to be important to both large and small scale science problems

                                                                                                                                              bull Equally import they can increase participation in research providing needed

                                                                                                                                              resources to userscommunities without ready access

                                                                                                                                              bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                                                                              support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                                                                              latency applications do not perform optimally on clouds today

                                                                                                                                              bull Provide valuable fault tolerance and scalability abstractions

                                                                                                                                              bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                                                                              bull Clouds services to support research provide considerable leverage for both

                                                                                                                                              individual researchers and entire communities of researchers

                                                                                                                                              • Day 2 - Azure as a PaaS
                                                                                                                                              • Day 2 - Applications

                                                                                                                                                Compressing Content

                                                                                                                                                1 Gzip all output content

                                                                                                                                                bull All modern browsers can decompress on the fly

                                                                                                                                                bull Compared to Compress Gzip has much better compression and freedom from patented algorithms

                                                                                                                                                2Tradeoff compute costs for storage size

                                                                                                                                                3Minimize image sizes

                                                                                                                                                bull Use Portable Network Graphics (PNGs)

                                                                                                                                                bull Crush your PNGs

                                                                                                                                                bull Strip needless metadata

                                                                                                                                                bull Make all PNGs palette PNGs

                                                                                                                                                Uncompressed

                                                                                                                                                Content

                                                                                                                                                Compressed

                                                                                                                                                Content

                                                                                                                                                Gzip

                                                                                                                                                Minify JavaScript

                                                                                                                                                Minify CCS

                                                                                                                                                Minify Images

                                                                                                                                                Best Practices Summary

                                                                                                                                                Doing lsquolessrsquo is the key to saving costs

                                                                                                                                                Measure everything

                                                                                                                                                Know your application profile in and out

                                                                                                                                                Research Examples in the Cloud

                                                                                                                                                hellipon another set of slides

                                                                                                                                                Map Reduce on Azure

                                                                                                                                                bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                                                                                                                                bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                                                                                                                                76

                                                                                                                                                Project Daytona - Map Reduce on Azure

                                                                                                                                                httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                                                                                                                                77

                                                                                                                                                Questions and Discussionhellip

                                                                                                                                                Thank you for hosting me at the Summer School

                                                                                                                                                BLAST (Basic Local Alignment Search Tool)

                                                                                                                                                bull The most important software in bioinformatics

                                                                                                                                                bull Identify similarity between bio-sequences

                                                                                                                                                Computationally intensive

                                                                                                                                                bull Large number of pairwise alignment operations

                                                                                                                                                bull A BLAST running can take 700 ~ 1000 CPU hours

                                                                                                                                                bull Sequence databases growing exponentially

                                                                                                                                                bull GenBank doubled in size in about 15 months

                                                                                                                                                It is easy to parallelize BLAST

                                                                                                                                                bull Segment the input

                                                                                                                                                bull Segment processing (querying) is pleasingly parallel

                                                                                                                                                bull Segment the database (eg mpiBLAST)

                                                                                                                                                bull Needs special result reduction processing

                                                                                                                                                Large volume data

                                                                                                                                                bull A normal Blast database can be as large as 10GB

                                                                                                                                                bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                                                                                                                bull The output of BLAST is usually 10-100x larger than the input

                                                                                                                                                bull Parallel BLAST engine on Azure

                                                                                                                                                bull Query-segmentation data-parallel pattern bull split the input sequences

                                                                                                                                                bull query partitions in parallel

                                                                                                                                                bull merge results together when done

                                                                                                                                                bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                                                                                                                bull With three special considerations bull Batch job management

                                                                                                                                                bull Task parallelism on an elastic Cloud

                                                                                                                                                Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                                                                                                                on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                                                                                                                A simple SplitJoin pattern

                                                                                                                                                Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                                                                                                                bull 1248 for small middle large and extra large instance size

                                                                                                                                                Task granularity bull Large partition load imbalance

                                                                                                                                                bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                                                                                                                bull Data transferring overhead

                                                                                                                                                Best Practice test runs to profiling and set size to mitigate the overhead

                                                                                                                                                Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                                                                                                                bull too small repeated computation

                                                                                                                                                bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                                                                                                                bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                                                                                                                bull Watch out for the 2-hour maximum limitation

                                                                                                                                                BLAST task

                                                                                                                                                Splitting task

                                                                                                                                                BLAST task

                                                                                                                                                BLAST task

                                                                                                                                                BLAST task

                                                                                                                                                hellip

                                                                                                                                                Merging Task

                                                                                                                                                Task size vs Performance

                                                                                                                                                bull Benefit of the warm cache effect

                                                                                                                                                bull 100 sequences per partition is the best choice

                                                                                                                                                Instance size vs Performance

                                                                                                                                                bull Super-linear speedup with larger size worker instances

                                                                                                                                                bull Primarily due to the memory capability

                                                                                                                                                Task SizeInstance Size vs Cost

                                                                                                                                                bull Extra-large instance generated the best and the most economical throughput

                                                                                                                                                bull Fully utilize the resource

                                                                                                                                                Web

                                                                                                                                                Portal

                                                                                                                                                Web

                                                                                                                                                Service

                                                                                                                                                Job registration

                                                                                                                                                Job Scheduler

                                                                                                                                                Worker

                                                                                                                                                Worker

                                                                                                                                                Worker

                                                                                                                                                Global

                                                                                                                                                dispatch

                                                                                                                                                queue

                                                                                                                                                Web Role

                                                                                                                                                Azure Table

                                                                                                                                                Job Management Role

                                                                                                                                                Azure Blob

                                                                                                                                                Database

                                                                                                                                                updating Role

                                                                                                                                                hellip

                                                                                                                                                Scaling Engine

                                                                                                                                                Blast databases

                                                                                                                                                temporary data etc)

                                                                                                                                                Job Registry NCBI databases

                                                                                                                                                BLAST task

                                                                                                                                                Splitting task

                                                                                                                                                BLAST task

                                                                                                                                                BLAST task

                                                                                                                                                BLAST task

                                                                                                                                                hellip

                                                                                                                                                Merging Task

                                                                                                                                                ASPNET program hosted by a web role instance bull Submit jobs

                                                                                                                                                bull Track jobrsquos status and logs

                                                                                                                                                AuthenticationAuthorization based on Live ID

                                                                                                                                                The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                                                                                                                Web Portal

                                                                                                                                                Web Service

                                                                                                                                                Job registration

                                                                                                                                                Job Scheduler

                                                                                                                                                Job Portal

                                                                                                                                                Scaling Engine

                                                                                                                                                Job Registry

                                                                                                                                                R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                                                                                                                Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                                                                                                                bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                                                                                                                AzureBLAST significantly saved computing timehellip

                                                                                                                                                Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                                                                                                                ldquoAll against Allrdquo query bull The database is also the input query

                                                                                                                                                bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                                                                                                                bull Theoretically 100 billion sequence comparisons

                                                                                                                                                Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                                                                                                                bull Would require 3216731 minutes (61 years) on one desktop

                                                                                                                                                One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                                                                                                                bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                                                                                                bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                                                                                                bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                                                                                                bull Each segment consists of smaller partitions

                                                                                                                                                bull When load imbalances redistribute the load manually

                                                                                                                                                5

                                                                                                                                                0

                                                                                                                                                62 6

                                                                                                                                                2 6

                                                                                                                                                2 6

                                                                                                                                                2 6

                                                                                                                                                2 5

                                                                                                                                                0 62

                                                                                                                                                bull Total size of the output result is ~230GB

                                                                                                                                                bull The number of total hits is 1764579487

                                                                                                                                                bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                                                                                                bull But based our estimates real working instance time should be 6~8 day

                                                                                                                                                bull Look into log data to analyze what took placehellip

                                                                                                                                                5

                                                                                                                                                0

                                                                                                                                                62 6

                                                                                                                                                2 6

                                                                                                                                                2 6

                                                                                                                                                2 6

                                                                                                                                                2 5

                                                                                                                                                0 62

                                                                                                                                                A normal log record should be

                                                                                                                                                Otherwise something is wrong (eg task failed to complete)

                                                                                                                                                3312010 614 RD00155D3611B0 Executing the task 251523

                                                                                                                                                3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                                                                                                3312010 625 RD00155D3611B0 Executing the task 251553

                                                                                                                                                3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                                                                                                3312010 644 RD00155D3611B0 Executing the task 251600

                                                                                                                                                3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                                                                                                3312010 822 RD00155D3611B0 Executing the task 251774

                                                                                                                                                3312010 950 RD00155D3611B0 Executing the task 251895

                                                                                                                                                3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                                                                                                North Europe Data Center totally 34256 tasks processed

                                                                                                                                                All 62 compute nodes lost tasks

                                                                                                                                                and then came back in a group

                                                                                                                                                This is an Update domain

                                                                                                                                                ~30 mins

                                                                                                                                                ~ 6 nodes in one group

                                                                                                                                                35 Nodes experience blob

                                                                                                                                                writing failure at same

                                                                                                                                                time

                                                                                                                                                West Europe Datacenter 30976 tasks are completed and job was killed

                                                                                                                                                A reasonable guess the

                                                                                                                                                Fault Domain is working

                                                                                                                                                MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                                                                                                You never miss the water till the well has run dry Irish Proverb

                                                                                                                                                ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                                                                                                Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                                                                                                λv = Latent heat of vaporization (Jg)

                                                                                                                                                Rn = Net radiation (W m-2)

                                                                                                                                                cp = Specific heat capacity of air (J kg-1 K-1)

                                                                                                                                                ρa = dry air density (kg m-3)

                                                                                                                                                δq = vapor pressure deficit (Pa)

                                                                                                                                                ga = Conductivity of air (inverse of ra) (m s-1)

                                                                                                                                                gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                                                                                                γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                                                                                                Estimating resistanceconductivity across a

                                                                                                                                                catchment can be tricky

                                                                                                                                                bull Lots of inputs big data reduction

                                                                                                                                                bull Some of the inputs are not so simple

                                                                                                                                                119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                                                                                                (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                                                                                                Penman-Monteith (1964)

                                                                                                                                                Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                                                                                                NASA MODIS

                                                                                                                                                imagery source

                                                                                                                                                archives

                                                                                                                                                5 TB (600K files)

                                                                                                                                                FLUXNET curated

                                                                                                                                                sensor dataset

                                                                                                                                                (30GB 960 files)

                                                                                                                                                FLUXNET curated

                                                                                                                                                field dataset

                                                                                                                                                2 KB (1 file)

                                                                                                                                                NCEPNCAR

                                                                                                                                                ~100MB

                                                                                                                                                (4K files)

                                                                                                                                                Vegetative

                                                                                                                                                clumping

                                                                                                                                                ~5MB (1file)

                                                                                                                                                Climate

                                                                                                                                                classification

                                                                                                                                                ~1MB (1file)

                                                                                                                                                20 US year = 1 global year

                                                                                                                                                Data collection (map) stage

                                                                                                                                                bull Downloads requested input tiles

                                                                                                                                                from NASA ftp sites

                                                                                                                                                bull Includes geospatial lookup for

                                                                                                                                                non-sinusoidal tiles that will

                                                                                                                                                contribute to a reprojected

                                                                                                                                                sinusoidal tile

                                                                                                                                                Reprojection (map) stage

                                                                                                                                                bull Converts source tile(s) to

                                                                                                                                                intermediate result sinusoidal tiles

                                                                                                                                                bull Simple nearest neighbor or spline

                                                                                                                                                algorithms

                                                                                                                                                Derivation reduction stage

                                                                                                                                                bull First stage visible to scientist

                                                                                                                                                bull Computes ET in our initial use

                                                                                                                                                Analysis reduction stage

                                                                                                                                                bull Optional second stage visible to

                                                                                                                                                scientist

                                                                                                                                                bull Enables production of science

                                                                                                                                                analysis artifacts such as maps

                                                                                                                                                tables virtual sensors

                                                                                                                                                Reduction 1

                                                                                                                                                Queue

                                                                                                                                                Source

                                                                                                                                                Metadata

                                                                                                                                                AzureMODIS

                                                                                                                                                Service Web Role Portal

                                                                                                                                                Request

                                                                                                                                                Queue

                                                                                                                                                Scientific

                                                                                                                                                Results

                                                                                                                                                Download

                                                                                                                                                Data Collection Stage

                                                                                                                                                Source Imagery Download Sites

                                                                                                                                                Reprojection

                                                                                                                                                Queue

                                                                                                                                                Reduction 2

                                                                                                                                                Queue

                                                                                                                                                Download

                                                                                                                                                Queue

                                                                                                                                                Scientists

                                                                                                                                                Science results

                                                                                                                                                Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                                httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                                                                                bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                                                                                bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                                                                                bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                                                                                recoverable units of work

                                                                                                                                                bull Execution status of all jobs and tasks persisted in Tables

                                                                                                                                                ltPipelineStagegt

                                                                                                                                                Request

                                                                                                                                                hellip ltPipelineStagegtJobStatus

                                                                                                                                                Persist ltPipelineStagegtJob Queue

                                                                                                                                                MODISAzure Service

                                                                                                                                                (Web Role)

                                                                                                                                                Service Monitor

                                                                                                                                                (Worker Role)

                                                                                                                                                Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                                hellip

                                                                                                                                                Dispatch

                                                                                                                                                ltPipelineStagegtTask Queue

                                                                                                                                                All work actually done by a Worker Role

                                                                                                                                                bull Sandboxes science or other executable

                                                                                                                                                bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                                                                                Service Monitor

                                                                                                                                                (Worker Role)

                                                                                                                                                Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                                GenericWorker

                                                                                                                                                (Worker Role)

                                                                                                                                                hellip

                                                                                                                                                hellip

                                                                                                                                                Dispatch

                                                                                                                                                ltPipelineStagegtTask Queue

                                                                                                                                                hellip

                                                                                                                                                ltInputgtData Storage

                                                                                                                                                bull Dequeues tasks created by the Service Monitor

                                                                                                                                                bull Retries failed tasks 3 times

                                                                                                                                                bull Maintains all task status

                                                                                                                                                Reprojection Request

                                                                                                                                                hellip

                                                                                                                                                Service Monitor

                                                                                                                                                (Worker Role)

                                                                                                                                                ReprojectionJobStatus Persist

                                                                                                                                                Parse amp Persist ReprojectionTaskStatus

                                                                                                                                                GenericWorker

                                                                                                                                                (Worker Role)

                                                                                                                                                hellip

                                                                                                                                                Job Queue

                                                                                                                                                hellip

                                                                                                                                                Dispatch

                                                                                                                                                Task Queue

                                                                                                                                                Points to

                                                                                                                                                hellip

                                                                                                                                                ScanTimeList

                                                                                                                                                SwathGranuleMeta

                                                                                                                                                Reprojection Data

                                                                                                                                                Storage

                                                                                                                                                Each entity specifies a

                                                                                                                                                single reprojection job

                                                                                                                                                request

                                                                                                                                                Each entity specifies a

                                                                                                                                                single reprojection task (ie

                                                                                                                                                a single tile)

                                                                                                                                                Query this table to get

                                                                                                                                                geo-metadata (eg

                                                                                                                                                boundaries) for each swath

                                                                                                                                                tile

                                                                                                                                                Query this table to get the

                                                                                                                                                list of satellite scan times

                                                                                                                                                that cover a target tile

                                                                                                                                                Swath Source

                                                                                                                                                Data Storage

                                                                                                                                                bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                                                                                bull Storage costs driven by data scale and 6 month project duration

                                                                                                                                                bull Small with respect to the people costs even at graduate student rates

                                                                                                                                                Reduction 1 Queue

                                                                                                                                                Source Metadata

                                                                                                                                                Request Queue

                                                                                                                                                Scientific Results Download

                                                                                                                                                Data Collection Stage

                                                                                                                                                Source Imagery Download Sites

                                                                                                                                                Reprojection Queue

                                                                                                                                                Reduction 2 Queue

                                                                                                                                                Download

                                                                                                                                                Queue

                                                                                                                                                Scientists

                                                                                                                                                Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                                400-500 GB

                                                                                                                                                60K files

                                                                                                                                                10 MBsec

                                                                                                                                                11 hours

                                                                                                                                                lt10 workers

                                                                                                                                                $50 upload

                                                                                                                                                $450 storage

                                                                                                                                                400 GB

                                                                                                                                                45K files

                                                                                                                                                3500 hours

                                                                                                                                                20-100

                                                                                                                                                workers

                                                                                                                                                5-7 GB

                                                                                                                                                55K files

                                                                                                                                                1800 hours

                                                                                                                                                20-100

                                                                                                                                                workers

                                                                                                                                                lt10 GB

                                                                                                                                                ~1K files

                                                                                                                                                1800 hours

                                                                                                                                                20-100

                                                                                                                                                workers

                                                                                                                                                $420 cpu

                                                                                                                                                $60 download

                                                                                                                                                $216 cpu

                                                                                                                                                $1 download

                                                                                                                                                $6 storage

                                                                                                                                                $216 cpu

                                                                                                                                                $2 download

                                                                                                                                                $9 storage

                                                                                                                                                AzureMODIS

                                                                                                                                                Service Web Role Portal

                                                                                                                                                Total $1420

                                                                                                                                                bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                                                                                potential to be important to both large and small scale science problems

                                                                                                                                                bull Equally import they can increase participation in research providing needed

                                                                                                                                                resources to userscommunities without ready access

                                                                                                                                                bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                                                                                support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                                                                                latency applications do not perform optimally on clouds today

                                                                                                                                                bull Provide valuable fault tolerance and scalability abstractions

                                                                                                                                                bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                                                                                bull Clouds services to support research provide considerable leverage for both

                                                                                                                                                individual researchers and entire communities of researchers

                                                                                                                                                • Day 2 - Azure as a PaaS
                                                                                                                                                • Day 2 - Applications

                                                                                                                                                  Best Practices Summary

                                                                                                                                                  Doing lsquolessrsquo is the key to saving costs

                                                                                                                                                  Measure everything

                                                                                                                                                  Know your application profile in and out

                                                                                                                                                  Research Examples in the Cloud

                                                                                                                                                  hellipon another set of slides

                                                                                                                                                  Map Reduce on Azure

                                                                                                                                                  bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                                                                                                                                  bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                                                                                                                                  76

                                                                                                                                                  Project Daytona - Map Reduce on Azure

                                                                                                                                                  httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                                                                                                                                  77

                                                                                                                                                  Questions and Discussionhellip

                                                                                                                                                  Thank you for hosting me at the Summer School

                                                                                                                                                  BLAST (Basic Local Alignment Search Tool)

                                                                                                                                                  bull The most important software in bioinformatics

                                                                                                                                                  bull Identify similarity between bio-sequences

                                                                                                                                                  Computationally intensive

                                                                                                                                                  bull Large number of pairwise alignment operations

                                                                                                                                                  bull A BLAST running can take 700 ~ 1000 CPU hours

                                                                                                                                                  bull Sequence databases growing exponentially

                                                                                                                                                  bull GenBank doubled in size in about 15 months

                                                                                                                                                  It is easy to parallelize BLAST

                                                                                                                                                  bull Segment the input

                                                                                                                                                  bull Segment processing (querying) is pleasingly parallel

                                                                                                                                                  bull Segment the database (eg mpiBLAST)

                                                                                                                                                  bull Needs special result reduction processing

                                                                                                                                                  Large volume data

                                                                                                                                                  bull A normal Blast database can be as large as 10GB

                                                                                                                                                  bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                                                                                                                  bull The output of BLAST is usually 10-100x larger than the input

                                                                                                                                                  bull Parallel BLAST engine on Azure

                                                                                                                                                  bull Query-segmentation data-parallel pattern bull split the input sequences

                                                                                                                                                  bull query partitions in parallel

                                                                                                                                                  bull merge results together when done

                                                                                                                                                  bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                                                                                                                  bull With three special considerations bull Batch job management

                                                                                                                                                  bull Task parallelism on an elastic Cloud

                                                                                                                                                  Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                                                                                                                  on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                                                                                                                  A simple SplitJoin pattern

                                                                                                                                                  Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                                                                                                                  bull 1248 for small middle large and extra large instance size

                                                                                                                                                  Task granularity bull Large partition load imbalance

                                                                                                                                                  bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                                                                                                                  bull Data transferring overhead

                                                                                                                                                  Best Practice test runs to profiling and set size to mitigate the overhead

                                                                                                                                                  Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                                                                                                                  bull too small repeated computation

                                                                                                                                                  bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                                                                                                                  bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                                                                                                                  bull Watch out for the 2-hour maximum limitation

                                                                                                                                                  BLAST task

                                                                                                                                                  Splitting task

                                                                                                                                                  BLAST task

                                                                                                                                                  BLAST task

                                                                                                                                                  BLAST task

                                                                                                                                                  hellip

                                                                                                                                                  Merging Task

                                                                                                                                                  Task size vs Performance

                                                                                                                                                  bull Benefit of the warm cache effect

                                                                                                                                                  bull 100 sequences per partition is the best choice

                                                                                                                                                  Instance size vs Performance

                                                                                                                                                  bull Super-linear speedup with larger size worker instances

                                                                                                                                                  bull Primarily due to the memory capability

                                                                                                                                                  Task SizeInstance Size vs Cost

                                                                                                                                                  bull Extra-large instance generated the best and the most economical throughput

                                                                                                                                                  bull Fully utilize the resource

                                                                                                                                                  Web

                                                                                                                                                  Portal

                                                                                                                                                  Web

                                                                                                                                                  Service

                                                                                                                                                  Job registration

                                                                                                                                                  Job Scheduler

                                                                                                                                                  Worker

                                                                                                                                                  Worker

                                                                                                                                                  Worker

                                                                                                                                                  Global

                                                                                                                                                  dispatch

                                                                                                                                                  queue

                                                                                                                                                  Web Role

                                                                                                                                                  Azure Table

                                                                                                                                                  Job Management Role

                                                                                                                                                  Azure Blob

                                                                                                                                                  Database

                                                                                                                                                  updating Role

                                                                                                                                                  hellip

                                                                                                                                                  Scaling Engine

                                                                                                                                                  Blast databases

                                                                                                                                                  temporary data etc)

                                                                                                                                                  Job Registry NCBI databases

                                                                                                                                                  BLAST task

                                                                                                                                                  Splitting task

                                                                                                                                                  BLAST task

                                                                                                                                                  BLAST task

                                                                                                                                                  BLAST task

                                                                                                                                                  hellip

                                                                                                                                                  Merging Task

                                                                                                                                                  ASPNET program hosted by a web role instance bull Submit jobs

                                                                                                                                                  bull Track jobrsquos status and logs

                                                                                                                                                  AuthenticationAuthorization based on Live ID

                                                                                                                                                  The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                                                                                                                  Web Portal

                                                                                                                                                  Web Service

                                                                                                                                                  Job registration

                                                                                                                                                  Job Scheduler

                                                                                                                                                  Job Portal

                                                                                                                                                  Scaling Engine

                                                                                                                                                  Job Registry

                                                                                                                                                  R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                                                                                                                  Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                                                                                                                  bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                                                                                                                  AzureBLAST significantly saved computing timehellip

                                                                                                                                                  Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                                                                                                                  ldquoAll against Allrdquo query bull The database is also the input query

                                                                                                                                                  bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                                                                                                                  bull Theoretically 100 billion sequence comparisons

                                                                                                                                                  Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                                                                                                                  bull Would require 3216731 minutes (61 years) on one desktop

                                                                                                                                                  One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                                                                                                                  bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                                                                                                  bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                                                                                                  bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                                                                                                  bull Each segment consists of smaller partitions

                                                                                                                                                  bull When load imbalances redistribute the load manually

                                                                                                                                                  5

                                                                                                                                                  0

                                                                                                                                                  62 6

                                                                                                                                                  2 6

                                                                                                                                                  2 6

                                                                                                                                                  2 6

                                                                                                                                                  2 5

                                                                                                                                                  0 62

                                                                                                                                                  bull Total size of the output result is ~230GB

                                                                                                                                                  bull The number of total hits is 1764579487

                                                                                                                                                  bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                                                                                                  bull But based our estimates real working instance time should be 6~8 day

                                                                                                                                                  bull Look into log data to analyze what took placehellip

                                                                                                                                                  5

                                                                                                                                                  0

                                                                                                                                                  62 6

                                                                                                                                                  2 6

                                                                                                                                                  2 6

                                                                                                                                                  2 6

                                                                                                                                                  2 5

                                                                                                                                                  0 62

                                                                                                                                                  A normal log record should be

                                                                                                                                                  Otherwise something is wrong (eg task failed to complete)

                                                                                                                                                  3312010 614 RD00155D3611B0 Executing the task 251523

                                                                                                                                                  3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                                                                                                  3312010 625 RD00155D3611B0 Executing the task 251553

                                                                                                                                                  3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                                                                                                  3312010 644 RD00155D3611B0 Executing the task 251600

                                                                                                                                                  3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                                                                                                  3312010 822 RD00155D3611B0 Executing the task 251774

                                                                                                                                                  3312010 950 RD00155D3611B0 Executing the task 251895

                                                                                                                                                  3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                                                                                                  North Europe Data Center totally 34256 tasks processed

                                                                                                                                                  All 62 compute nodes lost tasks

                                                                                                                                                  and then came back in a group

                                                                                                                                                  This is an Update domain

                                                                                                                                                  ~30 mins

                                                                                                                                                  ~ 6 nodes in one group

                                                                                                                                                  35 Nodes experience blob

                                                                                                                                                  writing failure at same

                                                                                                                                                  time

                                                                                                                                                  West Europe Datacenter 30976 tasks are completed and job was killed

                                                                                                                                                  A reasonable guess the

                                                                                                                                                  Fault Domain is working

                                                                                                                                                  MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                                                                                                  You never miss the water till the well has run dry Irish Proverb

                                                                                                                                                  ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                                                                                                  Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                                                                                                  λv = Latent heat of vaporization (Jg)

                                                                                                                                                  Rn = Net radiation (W m-2)

                                                                                                                                                  cp = Specific heat capacity of air (J kg-1 K-1)

                                                                                                                                                  ρa = dry air density (kg m-3)

                                                                                                                                                  δq = vapor pressure deficit (Pa)

                                                                                                                                                  ga = Conductivity of air (inverse of ra) (m s-1)

                                                                                                                                                  gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                                                                                                  γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                                                                                                  Estimating resistanceconductivity across a

                                                                                                                                                  catchment can be tricky

                                                                                                                                                  bull Lots of inputs big data reduction

                                                                                                                                                  bull Some of the inputs are not so simple

                                                                                                                                                  119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                                                                                                  (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                                                                                                  Penman-Monteith (1964)

                                                                                                                                                  Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                                                                                                  NASA MODIS

                                                                                                                                                  imagery source

                                                                                                                                                  archives

                                                                                                                                                  5 TB (600K files)

                                                                                                                                                  FLUXNET curated

                                                                                                                                                  sensor dataset

                                                                                                                                                  (30GB 960 files)

                                                                                                                                                  FLUXNET curated

                                                                                                                                                  field dataset

                                                                                                                                                  2 KB (1 file)

                                                                                                                                                  NCEPNCAR

                                                                                                                                                  ~100MB

                                                                                                                                                  (4K files)

                                                                                                                                                  Vegetative

                                                                                                                                                  clumping

                                                                                                                                                  ~5MB (1file)

                                                                                                                                                  Climate

                                                                                                                                                  classification

                                                                                                                                                  ~1MB (1file)

                                                                                                                                                  20 US year = 1 global year

                                                                                                                                                  Data collection (map) stage

                                                                                                                                                  bull Downloads requested input tiles

                                                                                                                                                  from NASA ftp sites

                                                                                                                                                  bull Includes geospatial lookup for

                                                                                                                                                  non-sinusoidal tiles that will

                                                                                                                                                  contribute to a reprojected

                                                                                                                                                  sinusoidal tile

                                                                                                                                                  Reprojection (map) stage

                                                                                                                                                  bull Converts source tile(s) to

                                                                                                                                                  intermediate result sinusoidal tiles

                                                                                                                                                  bull Simple nearest neighbor or spline

                                                                                                                                                  algorithms

                                                                                                                                                  Derivation reduction stage

                                                                                                                                                  bull First stage visible to scientist

                                                                                                                                                  bull Computes ET in our initial use

                                                                                                                                                  Analysis reduction stage

                                                                                                                                                  bull Optional second stage visible to

                                                                                                                                                  scientist

                                                                                                                                                  bull Enables production of science

                                                                                                                                                  analysis artifacts such as maps

                                                                                                                                                  tables virtual sensors

                                                                                                                                                  Reduction 1

                                                                                                                                                  Queue

                                                                                                                                                  Source

                                                                                                                                                  Metadata

                                                                                                                                                  AzureMODIS

                                                                                                                                                  Service Web Role Portal

                                                                                                                                                  Request

                                                                                                                                                  Queue

                                                                                                                                                  Scientific

                                                                                                                                                  Results

                                                                                                                                                  Download

                                                                                                                                                  Data Collection Stage

                                                                                                                                                  Source Imagery Download Sites

                                                                                                                                                  Reprojection

                                                                                                                                                  Queue

                                                                                                                                                  Reduction 2

                                                                                                                                                  Queue

                                                                                                                                                  Download

                                                                                                                                                  Queue

                                                                                                                                                  Scientists

                                                                                                                                                  Science results

                                                                                                                                                  Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                                  httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                                                                                  bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                                                                                  bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                                                                                  bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                                                                                  recoverable units of work

                                                                                                                                                  bull Execution status of all jobs and tasks persisted in Tables

                                                                                                                                                  ltPipelineStagegt

                                                                                                                                                  Request

                                                                                                                                                  hellip ltPipelineStagegtJobStatus

                                                                                                                                                  Persist ltPipelineStagegtJob Queue

                                                                                                                                                  MODISAzure Service

                                                                                                                                                  (Web Role)

                                                                                                                                                  Service Monitor

                                                                                                                                                  (Worker Role)

                                                                                                                                                  Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                                  hellip

                                                                                                                                                  Dispatch

                                                                                                                                                  ltPipelineStagegtTask Queue

                                                                                                                                                  All work actually done by a Worker Role

                                                                                                                                                  bull Sandboxes science or other executable

                                                                                                                                                  bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                                                                                  Service Monitor

                                                                                                                                                  (Worker Role)

                                                                                                                                                  Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                                  GenericWorker

                                                                                                                                                  (Worker Role)

                                                                                                                                                  hellip

                                                                                                                                                  hellip

                                                                                                                                                  Dispatch

                                                                                                                                                  ltPipelineStagegtTask Queue

                                                                                                                                                  hellip

                                                                                                                                                  ltInputgtData Storage

                                                                                                                                                  bull Dequeues tasks created by the Service Monitor

                                                                                                                                                  bull Retries failed tasks 3 times

                                                                                                                                                  bull Maintains all task status

                                                                                                                                                  Reprojection Request

                                                                                                                                                  hellip

                                                                                                                                                  Service Monitor

                                                                                                                                                  (Worker Role)

                                                                                                                                                  ReprojectionJobStatus Persist

                                                                                                                                                  Parse amp Persist ReprojectionTaskStatus

                                                                                                                                                  GenericWorker

                                                                                                                                                  (Worker Role)

                                                                                                                                                  hellip

                                                                                                                                                  Job Queue

                                                                                                                                                  hellip

                                                                                                                                                  Dispatch

                                                                                                                                                  Task Queue

                                                                                                                                                  Points to

                                                                                                                                                  hellip

                                                                                                                                                  ScanTimeList

                                                                                                                                                  SwathGranuleMeta

                                                                                                                                                  Reprojection Data

                                                                                                                                                  Storage

                                                                                                                                                  Each entity specifies a

                                                                                                                                                  single reprojection job

                                                                                                                                                  request

                                                                                                                                                  Each entity specifies a

                                                                                                                                                  single reprojection task (ie

                                                                                                                                                  a single tile)

                                                                                                                                                  Query this table to get

                                                                                                                                                  geo-metadata (eg

                                                                                                                                                  boundaries) for each swath

                                                                                                                                                  tile

                                                                                                                                                  Query this table to get the

                                                                                                                                                  list of satellite scan times

                                                                                                                                                  that cover a target tile

                                                                                                                                                  Swath Source

                                                                                                                                                  Data Storage

                                                                                                                                                  bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                                                                                  bull Storage costs driven by data scale and 6 month project duration

                                                                                                                                                  bull Small with respect to the people costs even at graduate student rates

                                                                                                                                                  Reduction 1 Queue

                                                                                                                                                  Source Metadata

                                                                                                                                                  Request Queue

                                                                                                                                                  Scientific Results Download

                                                                                                                                                  Data Collection Stage

                                                                                                                                                  Source Imagery Download Sites

                                                                                                                                                  Reprojection Queue

                                                                                                                                                  Reduction 2 Queue

                                                                                                                                                  Download

                                                                                                                                                  Queue

                                                                                                                                                  Scientists

                                                                                                                                                  Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                                  400-500 GB

                                                                                                                                                  60K files

                                                                                                                                                  10 MBsec

                                                                                                                                                  11 hours

                                                                                                                                                  lt10 workers

                                                                                                                                                  $50 upload

                                                                                                                                                  $450 storage

                                                                                                                                                  400 GB

                                                                                                                                                  45K files

                                                                                                                                                  3500 hours

                                                                                                                                                  20-100

                                                                                                                                                  workers

                                                                                                                                                  5-7 GB

                                                                                                                                                  55K files

                                                                                                                                                  1800 hours

                                                                                                                                                  20-100

                                                                                                                                                  workers

                                                                                                                                                  lt10 GB

                                                                                                                                                  ~1K files

                                                                                                                                                  1800 hours

                                                                                                                                                  20-100

                                                                                                                                                  workers

                                                                                                                                                  $420 cpu

                                                                                                                                                  $60 download

                                                                                                                                                  $216 cpu

                                                                                                                                                  $1 download

                                                                                                                                                  $6 storage

                                                                                                                                                  $216 cpu

                                                                                                                                                  $2 download

                                                                                                                                                  $9 storage

                                                                                                                                                  AzureMODIS

                                                                                                                                                  Service Web Role Portal

                                                                                                                                                  Total $1420

                                                                                                                                                  bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                                                                                  potential to be important to both large and small scale science problems

                                                                                                                                                  bull Equally import they can increase participation in research providing needed

                                                                                                                                                  resources to userscommunities without ready access

                                                                                                                                                  bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                                                                                  support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                                                                                  latency applications do not perform optimally on clouds today

                                                                                                                                                  bull Provide valuable fault tolerance and scalability abstractions

                                                                                                                                                  bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                                                                                  bull Clouds services to support research provide considerable leverage for both

                                                                                                                                                  individual researchers and entire communities of researchers

                                                                                                                                                  • Day 2 - Azure as a PaaS
                                                                                                                                                  • Day 2 - Applications

                                                                                                                                                    Research Examples in the Cloud

                                                                                                                                                    hellipon another set of slides

                                                                                                                                                    Map Reduce on Azure

                                                                                                                                                    bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                                                                                                                                    bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                                                                                                                                    76

                                                                                                                                                    Project Daytona - Map Reduce on Azure

                                                                                                                                                    httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                                                                                                                                    77

                                                                                                                                                    Questions and Discussionhellip

                                                                                                                                                    Thank you for hosting me at the Summer School

                                                                                                                                                    BLAST (Basic Local Alignment Search Tool)

                                                                                                                                                    bull The most important software in bioinformatics

                                                                                                                                                    bull Identify similarity between bio-sequences

                                                                                                                                                    Computationally intensive

                                                                                                                                                    bull Large number of pairwise alignment operations

                                                                                                                                                    bull A BLAST running can take 700 ~ 1000 CPU hours

                                                                                                                                                    bull Sequence databases growing exponentially

                                                                                                                                                    bull GenBank doubled in size in about 15 months

                                                                                                                                                    It is easy to parallelize BLAST

                                                                                                                                                    bull Segment the input

                                                                                                                                                    bull Segment processing (querying) is pleasingly parallel

                                                                                                                                                    bull Segment the database (eg mpiBLAST)

                                                                                                                                                    bull Needs special result reduction processing

                                                                                                                                                    Large volume data

                                                                                                                                                    bull A normal Blast database can be as large as 10GB

                                                                                                                                                    bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                                                                                                                    bull The output of BLAST is usually 10-100x larger than the input

                                                                                                                                                    bull Parallel BLAST engine on Azure

                                                                                                                                                    bull Query-segmentation data-parallel pattern bull split the input sequences

                                                                                                                                                    bull query partitions in parallel

                                                                                                                                                    bull merge results together when done

                                                                                                                                                    bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                                                                                                                    bull With three special considerations bull Batch job management

                                                                                                                                                    bull Task parallelism on an elastic Cloud

                                                                                                                                                    Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                                                                                                                    on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                                                                                                                    A simple SplitJoin pattern

                                                                                                                                                    Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                                                                                                                    bull 1248 for small middle large and extra large instance size

                                                                                                                                                    Task granularity bull Large partition load imbalance

                                                                                                                                                    bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                                                                                                                    bull Data transferring overhead

                                                                                                                                                    Best Practice test runs to profiling and set size to mitigate the overhead

                                                                                                                                                    Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                                                                                                                    bull too small repeated computation

                                                                                                                                                    bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                                                                                                                    bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                                                                                                                    bull Watch out for the 2-hour maximum limitation

                                                                                                                                                    BLAST task

                                                                                                                                                    Splitting task

                                                                                                                                                    BLAST task

                                                                                                                                                    BLAST task

                                                                                                                                                    BLAST task

                                                                                                                                                    hellip

                                                                                                                                                    Merging Task

                                                                                                                                                    Task size vs Performance

                                                                                                                                                    bull Benefit of the warm cache effect

                                                                                                                                                    bull 100 sequences per partition is the best choice

                                                                                                                                                    Instance size vs Performance

                                                                                                                                                    bull Super-linear speedup with larger size worker instances

                                                                                                                                                    bull Primarily due to the memory capability

                                                                                                                                                    Task SizeInstance Size vs Cost

                                                                                                                                                    bull Extra-large instance generated the best and the most economical throughput

                                                                                                                                                    bull Fully utilize the resource

                                                                                                                                                    Web

                                                                                                                                                    Portal

                                                                                                                                                    Web

                                                                                                                                                    Service

                                                                                                                                                    Job registration

                                                                                                                                                    Job Scheduler

                                                                                                                                                    Worker

                                                                                                                                                    Worker

                                                                                                                                                    Worker

                                                                                                                                                    Global

                                                                                                                                                    dispatch

                                                                                                                                                    queue

                                                                                                                                                    Web Role

                                                                                                                                                    Azure Table

                                                                                                                                                    Job Management Role

                                                                                                                                                    Azure Blob

                                                                                                                                                    Database

                                                                                                                                                    updating Role

                                                                                                                                                    hellip

                                                                                                                                                    Scaling Engine

                                                                                                                                                    Blast databases

                                                                                                                                                    temporary data etc)

                                                                                                                                                    Job Registry NCBI databases

                                                                                                                                                    BLAST task

                                                                                                                                                    Splitting task

                                                                                                                                                    BLAST task

                                                                                                                                                    BLAST task

                                                                                                                                                    BLAST task

                                                                                                                                                    hellip

                                                                                                                                                    Merging Task

                                                                                                                                                    ASPNET program hosted by a web role instance bull Submit jobs

                                                                                                                                                    bull Track jobrsquos status and logs

                                                                                                                                                    AuthenticationAuthorization based on Live ID

                                                                                                                                                    The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                                                                                                                    Web Portal

                                                                                                                                                    Web Service

                                                                                                                                                    Job registration

                                                                                                                                                    Job Scheduler

                                                                                                                                                    Job Portal

                                                                                                                                                    Scaling Engine

                                                                                                                                                    Job Registry

                                                                                                                                                    R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                                                                                                                    Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                                                                                                                    bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                                                                                                                    AzureBLAST significantly saved computing timehellip

                                                                                                                                                    Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                                                                                                                    ldquoAll against Allrdquo query bull The database is also the input query

                                                                                                                                                    bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                                                                                                                    bull Theoretically 100 billion sequence comparisons

                                                                                                                                                    Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                                                                                                                    bull Would require 3216731 minutes (61 years) on one desktop

                                                                                                                                                    One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                                                                                                                    bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                                                                                                    bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                                                                                                    bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                                                                                                    bull Each segment consists of smaller partitions

                                                                                                                                                    bull When load imbalances redistribute the load manually

                                                                                                                                                    5

                                                                                                                                                    0

                                                                                                                                                    62 6

                                                                                                                                                    2 6

                                                                                                                                                    2 6

                                                                                                                                                    2 6

                                                                                                                                                    2 5

                                                                                                                                                    0 62

                                                                                                                                                    bull Total size of the output result is ~230GB

                                                                                                                                                    bull The number of total hits is 1764579487

                                                                                                                                                    bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                                                                                                    bull But based our estimates real working instance time should be 6~8 day

                                                                                                                                                    bull Look into log data to analyze what took placehellip

                                                                                                                                                    5

                                                                                                                                                    0

                                                                                                                                                    62 6

                                                                                                                                                    2 6

                                                                                                                                                    2 6

                                                                                                                                                    2 6

                                                                                                                                                    2 5

                                                                                                                                                    0 62

                                                                                                                                                    A normal log record should be

                                                                                                                                                    Otherwise something is wrong (eg task failed to complete)

                                                                                                                                                    3312010 614 RD00155D3611B0 Executing the task 251523

                                                                                                                                                    3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                                                                                                    3312010 625 RD00155D3611B0 Executing the task 251553

                                                                                                                                                    3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                                                                                                    3312010 644 RD00155D3611B0 Executing the task 251600

                                                                                                                                                    3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                                                                                                    3312010 822 RD00155D3611B0 Executing the task 251774

                                                                                                                                                    3312010 950 RD00155D3611B0 Executing the task 251895

                                                                                                                                                    3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                                                                                                    North Europe Data Center totally 34256 tasks processed

                                                                                                                                                    All 62 compute nodes lost tasks

                                                                                                                                                    and then came back in a group

                                                                                                                                                    This is an Update domain

                                                                                                                                                    ~30 mins

                                                                                                                                                    ~ 6 nodes in one group

                                                                                                                                                    35 Nodes experience blob

                                                                                                                                                    writing failure at same

                                                                                                                                                    time

                                                                                                                                                    West Europe Datacenter 30976 tasks are completed and job was killed

                                                                                                                                                    A reasonable guess the

                                                                                                                                                    Fault Domain is working

                                                                                                                                                    MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                                                                                                    You never miss the water till the well has run dry Irish Proverb

                                                                                                                                                    ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                                                                                                    Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                                                                                                    λv = Latent heat of vaporization (Jg)

                                                                                                                                                    Rn = Net radiation (W m-2)

                                                                                                                                                    cp = Specific heat capacity of air (J kg-1 K-1)

                                                                                                                                                    ρa = dry air density (kg m-3)

                                                                                                                                                    δq = vapor pressure deficit (Pa)

                                                                                                                                                    ga = Conductivity of air (inverse of ra) (m s-1)

                                                                                                                                                    gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                                                                                                    γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                                                                                                    Estimating resistanceconductivity across a

                                                                                                                                                    catchment can be tricky

                                                                                                                                                    bull Lots of inputs big data reduction

                                                                                                                                                    bull Some of the inputs are not so simple

                                                                                                                                                    119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                                                                                                    (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                                                                                                    Penman-Monteith (1964)

                                                                                                                                                    Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                                                                                                    NASA MODIS

                                                                                                                                                    imagery source

                                                                                                                                                    archives

                                                                                                                                                    5 TB (600K files)

                                                                                                                                                    FLUXNET curated

                                                                                                                                                    sensor dataset

                                                                                                                                                    (30GB 960 files)

                                                                                                                                                    FLUXNET curated

                                                                                                                                                    field dataset

                                                                                                                                                    2 KB (1 file)

                                                                                                                                                    NCEPNCAR

                                                                                                                                                    ~100MB

                                                                                                                                                    (4K files)

                                                                                                                                                    Vegetative

                                                                                                                                                    clumping

                                                                                                                                                    ~5MB (1file)

                                                                                                                                                    Climate

                                                                                                                                                    classification

                                                                                                                                                    ~1MB (1file)

                                                                                                                                                    20 US year = 1 global year

                                                                                                                                                    Data collection (map) stage

                                                                                                                                                    bull Downloads requested input tiles

                                                                                                                                                    from NASA ftp sites

                                                                                                                                                    bull Includes geospatial lookup for

                                                                                                                                                    non-sinusoidal tiles that will

                                                                                                                                                    contribute to a reprojected

                                                                                                                                                    sinusoidal tile

                                                                                                                                                    Reprojection (map) stage

                                                                                                                                                    bull Converts source tile(s) to

                                                                                                                                                    intermediate result sinusoidal tiles

                                                                                                                                                    bull Simple nearest neighbor or spline

                                                                                                                                                    algorithms

                                                                                                                                                    Derivation reduction stage

                                                                                                                                                    bull First stage visible to scientist

                                                                                                                                                    bull Computes ET in our initial use

                                                                                                                                                    Analysis reduction stage

                                                                                                                                                    bull Optional second stage visible to

                                                                                                                                                    scientist

                                                                                                                                                    bull Enables production of science

                                                                                                                                                    analysis artifacts such as maps

                                                                                                                                                    tables virtual sensors

                                                                                                                                                    Reduction 1

                                                                                                                                                    Queue

                                                                                                                                                    Source

                                                                                                                                                    Metadata

                                                                                                                                                    AzureMODIS

                                                                                                                                                    Service Web Role Portal

                                                                                                                                                    Request

                                                                                                                                                    Queue

                                                                                                                                                    Scientific

                                                                                                                                                    Results

                                                                                                                                                    Download

                                                                                                                                                    Data Collection Stage

                                                                                                                                                    Source Imagery Download Sites

                                                                                                                                                    Reprojection

                                                                                                                                                    Queue

                                                                                                                                                    Reduction 2

                                                                                                                                                    Queue

                                                                                                                                                    Download

                                                                                                                                                    Queue

                                                                                                                                                    Scientists

                                                                                                                                                    Science results

                                                                                                                                                    Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                                    httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                                                                                    bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                                                                                    bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                                                                                    bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                                                                                    recoverable units of work

                                                                                                                                                    bull Execution status of all jobs and tasks persisted in Tables

                                                                                                                                                    ltPipelineStagegt

                                                                                                                                                    Request

                                                                                                                                                    hellip ltPipelineStagegtJobStatus

                                                                                                                                                    Persist ltPipelineStagegtJob Queue

                                                                                                                                                    MODISAzure Service

                                                                                                                                                    (Web Role)

                                                                                                                                                    Service Monitor

                                                                                                                                                    (Worker Role)

                                                                                                                                                    Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                                    hellip

                                                                                                                                                    Dispatch

                                                                                                                                                    ltPipelineStagegtTask Queue

                                                                                                                                                    All work actually done by a Worker Role

                                                                                                                                                    bull Sandboxes science or other executable

                                                                                                                                                    bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                                                                                    Service Monitor

                                                                                                                                                    (Worker Role)

                                                                                                                                                    Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                                    GenericWorker

                                                                                                                                                    (Worker Role)

                                                                                                                                                    hellip

                                                                                                                                                    hellip

                                                                                                                                                    Dispatch

                                                                                                                                                    ltPipelineStagegtTask Queue

                                                                                                                                                    hellip

                                                                                                                                                    ltInputgtData Storage

                                                                                                                                                    bull Dequeues tasks created by the Service Monitor

                                                                                                                                                    bull Retries failed tasks 3 times

                                                                                                                                                    bull Maintains all task status

                                                                                                                                                    Reprojection Request

                                                                                                                                                    hellip

                                                                                                                                                    Service Monitor

                                                                                                                                                    (Worker Role)

                                                                                                                                                    ReprojectionJobStatus Persist

                                                                                                                                                    Parse amp Persist ReprojectionTaskStatus

                                                                                                                                                    GenericWorker

                                                                                                                                                    (Worker Role)

                                                                                                                                                    hellip

                                                                                                                                                    Job Queue

                                                                                                                                                    hellip

                                                                                                                                                    Dispatch

                                                                                                                                                    Task Queue

                                                                                                                                                    Points to

                                                                                                                                                    hellip

                                                                                                                                                    ScanTimeList

                                                                                                                                                    SwathGranuleMeta

                                                                                                                                                    Reprojection Data

                                                                                                                                                    Storage

                                                                                                                                                    Each entity specifies a

                                                                                                                                                    single reprojection job

                                                                                                                                                    request

                                                                                                                                                    Each entity specifies a

                                                                                                                                                    single reprojection task (ie

                                                                                                                                                    a single tile)

                                                                                                                                                    Query this table to get

                                                                                                                                                    geo-metadata (eg

                                                                                                                                                    boundaries) for each swath

                                                                                                                                                    tile

                                                                                                                                                    Query this table to get the

                                                                                                                                                    list of satellite scan times

                                                                                                                                                    that cover a target tile

                                                                                                                                                    Swath Source

                                                                                                                                                    Data Storage

                                                                                                                                                    bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                                                                                    bull Storage costs driven by data scale and 6 month project duration

                                                                                                                                                    bull Small with respect to the people costs even at graduate student rates

                                                                                                                                                    Reduction 1 Queue

                                                                                                                                                    Source Metadata

                                                                                                                                                    Request Queue

                                                                                                                                                    Scientific Results Download

                                                                                                                                                    Data Collection Stage

                                                                                                                                                    Source Imagery Download Sites

                                                                                                                                                    Reprojection Queue

                                                                                                                                                    Reduction 2 Queue

                                                                                                                                                    Download

                                                                                                                                                    Queue

                                                                                                                                                    Scientists

                                                                                                                                                    Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                                    400-500 GB

                                                                                                                                                    60K files

                                                                                                                                                    10 MBsec

                                                                                                                                                    11 hours

                                                                                                                                                    lt10 workers

                                                                                                                                                    $50 upload

                                                                                                                                                    $450 storage

                                                                                                                                                    400 GB

                                                                                                                                                    45K files

                                                                                                                                                    3500 hours

                                                                                                                                                    20-100

                                                                                                                                                    workers

                                                                                                                                                    5-7 GB

                                                                                                                                                    55K files

                                                                                                                                                    1800 hours

                                                                                                                                                    20-100

                                                                                                                                                    workers

                                                                                                                                                    lt10 GB

                                                                                                                                                    ~1K files

                                                                                                                                                    1800 hours

                                                                                                                                                    20-100

                                                                                                                                                    workers

                                                                                                                                                    $420 cpu

                                                                                                                                                    $60 download

                                                                                                                                                    $216 cpu

                                                                                                                                                    $1 download

                                                                                                                                                    $6 storage

                                                                                                                                                    $216 cpu

                                                                                                                                                    $2 download

                                                                                                                                                    $9 storage

                                                                                                                                                    AzureMODIS

                                                                                                                                                    Service Web Role Portal

                                                                                                                                                    Total $1420

                                                                                                                                                    bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                                                                                    potential to be important to both large and small scale science problems

                                                                                                                                                    bull Equally import they can increase participation in research providing needed

                                                                                                                                                    resources to userscommunities without ready access

                                                                                                                                                    bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                                                                                    support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                                                                                    latency applications do not perform optimally on clouds today

                                                                                                                                                    bull Provide valuable fault tolerance and scalability abstractions

                                                                                                                                                    bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                                                                                    bull Clouds services to support research provide considerable leverage for both

                                                                                                                                                    individual researchers and entire communities of researchers

                                                                                                                                                    • Day 2 - Azure as a PaaS
                                                                                                                                                    • Day 2 - Applications

                                                                                                                                                      Map Reduce on Azure

                                                                                                                                                      bull Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web bull Hadoop implementation bull Hadoop has a long history and has been improved for stability bull Originally Designed for Cluster Systems

                                                                                                                                                      bull Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure bull Designed from the start to use cloud primitives bull Built-in fault tolerance bull REST based interface for writing your own clients

                                                                                                                                                      76

                                                                                                                                                      Project Daytona - Map Reduce on Azure

                                                                                                                                                      httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                                                                                                                                      77

                                                                                                                                                      Questions and Discussionhellip

                                                                                                                                                      Thank you for hosting me at the Summer School

                                                                                                                                                      BLAST (Basic Local Alignment Search Tool)

                                                                                                                                                      bull The most important software in bioinformatics

                                                                                                                                                      bull Identify similarity between bio-sequences

                                                                                                                                                      Computationally intensive

                                                                                                                                                      bull Large number of pairwise alignment operations

                                                                                                                                                      bull A BLAST running can take 700 ~ 1000 CPU hours

                                                                                                                                                      bull Sequence databases growing exponentially

                                                                                                                                                      bull GenBank doubled in size in about 15 months

                                                                                                                                                      It is easy to parallelize BLAST

                                                                                                                                                      bull Segment the input

                                                                                                                                                      bull Segment processing (querying) is pleasingly parallel

                                                                                                                                                      bull Segment the database (eg mpiBLAST)

                                                                                                                                                      bull Needs special result reduction processing

                                                                                                                                                      Large volume data

                                                                                                                                                      bull A normal Blast database can be as large as 10GB

                                                                                                                                                      bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                                                                                                                      bull The output of BLAST is usually 10-100x larger than the input

                                                                                                                                                      bull Parallel BLAST engine on Azure

                                                                                                                                                      bull Query-segmentation data-parallel pattern bull split the input sequences

                                                                                                                                                      bull query partitions in parallel

                                                                                                                                                      bull merge results together when done

                                                                                                                                                      bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                                                                                                                      bull With three special considerations bull Batch job management

                                                                                                                                                      bull Task parallelism on an elastic Cloud

                                                                                                                                                      Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                                                                                                                      on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                                                                                                                      A simple SplitJoin pattern

                                                                                                                                                      Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                                                                                                                      bull 1248 for small middle large and extra large instance size

                                                                                                                                                      Task granularity bull Large partition load imbalance

                                                                                                                                                      bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                                                                                                                      bull Data transferring overhead

                                                                                                                                                      Best Practice test runs to profiling and set size to mitigate the overhead

                                                                                                                                                      Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                                                                                                                      bull too small repeated computation

                                                                                                                                                      bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                                                                                                                      bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                                                                                                                      bull Watch out for the 2-hour maximum limitation

                                                                                                                                                      BLAST task

                                                                                                                                                      Splitting task

                                                                                                                                                      BLAST task

                                                                                                                                                      BLAST task

                                                                                                                                                      BLAST task

                                                                                                                                                      hellip

                                                                                                                                                      Merging Task

                                                                                                                                                      Task size vs Performance

                                                                                                                                                      bull Benefit of the warm cache effect

                                                                                                                                                      bull 100 sequences per partition is the best choice

                                                                                                                                                      Instance size vs Performance

                                                                                                                                                      bull Super-linear speedup with larger size worker instances

                                                                                                                                                      bull Primarily due to the memory capability

                                                                                                                                                      Task SizeInstance Size vs Cost

                                                                                                                                                      bull Extra-large instance generated the best and the most economical throughput

                                                                                                                                                      bull Fully utilize the resource

                                                                                                                                                      Web

                                                                                                                                                      Portal

                                                                                                                                                      Web

                                                                                                                                                      Service

                                                                                                                                                      Job registration

                                                                                                                                                      Job Scheduler

                                                                                                                                                      Worker

                                                                                                                                                      Worker

                                                                                                                                                      Worker

                                                                                                                                                      Global

                                                                                                                                                      dispatch

                                                                                                                                                      queue

                                                                                                                                                      Web Role

                                                                                                                                                      Azure Table

                                                                                                                                                      Job Management Role

                                                                                                                                                      Azure Blob

                                                                                                                                                      Database

                                                                                                                                                      updating Role

                                                                                                                                                      hellip

                                                                                                                                                      Scaling Engine

                                                                                                                                                      Blast databases

                                                                                                                                                      temporary data etc)

                                                                                                                                                      Job Registry NCBI databases

                                                                                                                                                      BLAST task

                                                                                                                                                      Splitting task

                                                                                                                                                      BLAST task

                                                                                                                                                      BLAST task

                                                                                                                                                      BLAST task

                                                                                                                                                      hellip

                                                                                                                                                      Merging Task

                                                                                                                                                      ASPNET program hosted by a web role instance bull Submit jobs

                                                                                                                                                      bull Track jobrsquos status and logs

                                                                                                                                                      AuthenticationAuthorization based on Live ID

                                                                                                                                                      The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                                                                                                                      Web Portal

                                                                                                                                                      Web Service

                                                                                                                                                      Job registration

                                                                                                                                                      Job Scheduler

                                                                                                                                                      Job Portal

                                                                                                                                                      Scaling Engine

                                                                                                                                                      Job Registry

                                                                                                                                                      R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                                                                                                                      Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                                                                                                                      bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                                                                                                                      AzureBLAST significantly saved computing timehellip

                                                                                                                                                      Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                                                                                                                      ldquoAll against Allrdquo query bull The database is also the input query

                                                                                                                                                      bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                                                                                                                      bull Theoretically 100 billion sequence comparisons

                                                                                                                                                      Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                                                                                                                      bull Would require 3216731 minutes (61 years) on one desktop

                                                                                                                                                      One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                                                                                                                      bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                                                                                                      bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                                                                                                      bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                                                                                                      bull Each segment consists of smaller partitions

                                                                                                                                                      bull When load imbalances redistribute the load manually

                                                                                                                                                      5

                                                                                                                                                      0

                                                                                                                                                      62 6

                                                                                                                                                      2 6

                                                                                                                                                      2 6

                                                                                                                                                      2 6

                                                                                                                                                      2 5

                                                                                                                                                      0 62

                                                                                                                                                      bull Total size of the output result is ~230GB

                                                                                                                                                      bull The number of total hits is 1764579487

                                                                                                                                                      bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                                                                                                      bull But based our estimates real working instance time should be 6~8 day

                                                                                                                                                      bull Look into log data to analyze what took placehellip

                                                                                                                                                      5

                                                                                                                                                      0

                                                                                                                                                      62 6

                                                                                                                                                      2 6

                                                                                                                                                      2 6

                                                                                                                                                      2 6

                                                                                                                                                      2 5

                                                                                                                                                      0 62

                                                                                                                                                      A normal log record should be

                                                                                                                                                      Otherwise something is wrong (eg task failed to complete)

                                                                                                                                                      3312010 614 RD00155D3611B0 Executing the task 251523

                                                                                                                                                      3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                                                                                                      3312010 625 RD00155D3611B0 Executing the task 251553

                                                                                                                                                      3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                                                                                                      3312010 644 RD00155D3611B0 Executing the task 251600

                                                                                                                                                      3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                                                                                                      3312010 822 RD00155D3611B0 Executing the task 251774

                                                                                                                                                      3312010 950 RD00155D3611B0 Executing the task 251895

                                                                                                                                                      3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                                                                                                      North Europe Data Center totally 34256 tasks processed

                                                                                                                                                      All 62 compute nodes lost tasks

                                                                                                                                                      and then came back in a group

                                                                                                                                                      This is an Update domain

                                                                                                                                                      ~30 mins

                                                                                                                                                      ~ 6 nodes in one group

                                                                                                                                                      35 Nodes experience blob

                                                                                                                                                      writing failure at same

                                                                                                                                                      time

                                                                                                                                                      West Europe Datacenter 30976 tasks are completed and job was killed

                                                                                                                                                      A reasonable guess the

                                                                                                                                                      Fault Domain is working

                                                                                                                                                      MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                                                                                                      You never miss the water till the well has run dry Irish Proverb

                                                                                                                                                      ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                                                                                                      Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                                                                                                      λv = Latent heat of vaporization (Jg)

                                                                                                                                                      Rn = Net radiation (W m-2)

                                                                                                                                                      cp = Specific heat capacity of air (J kg-1 K-1)

                                                                                                                                                      ρa = dry air density (kg m-3)

                                                                                                                                                      δq = vapor pressure deficit (Pa)

                                                                                                                                                      ga = Conductivity of air (inverse of ra) (m s-1)

                                                                                                                                                      gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                                                                                                      γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                                                                                                      Estimating resistanceconductivity across a

                                                                                                                                                      catchment can be tricky

                                                                                                                                                      bull Lots of inputs big data reduction

                                                                                                                                                      bull Some of the inputs are not so simple

                                                                                                                                                      119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                                                                                                      (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                                                                                                      Penman-Monteith (1964)

                                                                                                                                                      Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                                                                                                      NASA MODIS

                                                                                                                                                      imagery source

                                                                                                                                                      archives

                                                                                                                                                      5 TB (600K files)

                                                                                                                                                      FLUXNET curated

                                                                                                                                                      sensor dataset

                                                                                                                                                      (30GB 960 files)

                                                                                                                                                      FLUXNET curated

                                                                                                                                                      field dataset

                                                                                                                                                      2 KB (1 file)

                                                                                                                                                      NCEPNCAR

                                                                                                                                                      ~100MB

                                                                                                                                                      (4K files)

                                                                                                                                                      Vegetative

                                                                                                                                                      clumping

                                                                                                                                                      ~5MB (1file)

                                                                                                                                                      Climate

                                                                                                                                                      classification

                                                                                                                                                      ~1MB (1file)

                                                                                                                                                      20 US year = 1 global year

                                                                                                                                                      Data collection (map) stage

                                                                                                                                                      bull Downloads requested input tiles

                                                                                                                                                      from NASA ftp sites

                                                                                                                                                      bull Includes geospatial lookup for

                                                                                                                                                      non-sinusoidal tiles that will

                                                                                                                                                      contribute to a reprojected

                                                                                                                                                      sinusoidal tile

                                                                                                                                                      Reprojection (map) stage

                                                                                                                                                      bull Converts source tile(s) to

                                                                                                                                                      intermediate result sinusoidal tiles

                                                                                                                                                      bull Simple nearest neighbor or spline

                                                                                                                                                      algorithms

                                                                                                                                                      Derivation reduction stage

                                                                                                                                                      bull First stage visible to scientist

                                                                                                                                                      bull Computes ET in our initial use

                                                                                                                                                      Analysis reduction stage

                                                                                                                                                      bull Optional second stage visible to

                                                                                                                                                      scientist

                                                                                                                                                      bull Enables production of science

                                                                                                                                                      analysis artifacts such as maps

                                                                                                                                                      tables virtual sensors

                                                                                                                                                      Reduction 1

                                                                                                                                                      Queue

                                                                                                                                                      Source

                                                                                                                                                      Metadata

                                                                                                                                                      AzureMODIS

                                                                                                                                                      Service Web Role Portal

                                                                                                                                                      Request

                                                                                                                                                      Queue

                                                                                                                                                      Scientific

                                                                                                                                                      Results

                                                                                                                                                      Download

                                                                                                                                                      Data Collection Stage

                                                                                                                                                      Source Imagery Download Sites

                                                                                                                                                      Reprojection

                                                                                                                                                      Queue

                                                                                                                                                      Reduction 2

                                                                                                                                                      Queue

                                                                                                                                                      Download

                                                                                                                                                      Queue

                                                                                                                                                      Scientists

                                                                                                                                                      Science results

                                                                                                                                                      Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                                      httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                                                                                      bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                                                                                      bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                                                                                      bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                                                                                      recoverable units of work

                                                                                                                                                      bull Execution status of all jobs and tasks persisted in Tables

                                                                                                                                                      ltPipelineStagegt

                                                                                                                                                      Request

                                                                                                                                                      hellip ltPipelineStagegtJobStatus

                                                                                                                                                      Persist ltPipelineStagegtJob Queue

                                                                                                                                                      MODISAzure Service

                                                                                                                                                      (Web Role)

                                                                                                                                                      Service Monitor

                                                                                                                                                      (Worker Role)

                                                                                                                                                      Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                                      hellip

                                                                                                                                                      Dispatch

                                                                                                                                                      ltPipelineStagegtTask Queue

                                                                                                                                                      All work actually done by a Worker Role

                                                                                                                                                      bull Sandboxes science or other executable

                                                                                                                                                      bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                                                                                      Service Monitor

                                                                                                                                                      (Worker Role)

                                                                                                                                                      Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                                      GenericWorker

                                                                                                                                                      (Worker Role)

                                                                                                                                                      hellip

                                                                                                                                                      hellip

                                                                                                                                                      Dispatch

                                                                                                                                                      ltPipelineStagegtTask Queue

                                                                                                                                                      hellip

                                                                                                                                                      ltInputgtData Storage

                                                                                                                                                      bull Dequeues tasks created by the Service Monitor

                                                                                                                                                      bull Retries failed tasks 3 times

                                                                                                                                                      bull Maintains all task status

                                                                                                                                                      Reprojection Request

                                                                                                                                                      hellip

                                                                                                                                                      Service Monitor

                                                                                                                                                      (Worker Role)

                                                                                                                                                      ReprojectionJobStatus Persist

                                                                                                                                                      Parse amp Persist ReprojectionTaskStatus

                                                                                                                                                      GenericWorker

                                                                                                                                                      (Worker Role)

                                                                                                                                                      hellip

                                                                                                                                                      Job Queue

                                                                                                                                                      hellip

                                                                                                                                                      Dispatch

                                                                                                                                                      Task Queue

                                                                                                                                                      Points to

                                                                                                                                                      hellip

                                                                                                                                                      ScanTimeList

                                                                                                                                                      SwathGranuleMeta

                                                                                                                                                      Reprojection Data

                                                                                                                                                      Storage

                                                                                                                                                      Each entity specifies a

                                                                                                                                                      single reprojection job

                                                                                                                                                      request

                                                                                                                                                      Each entity specifies a

                                                                                                                                                      single reprojection task (ie

                                                                                                                                                      a single tile)

                                                                                                                                                      Query this table to get

                                                                                                                                                      geo-metadata (eg

                                                                                                                                                      boundaries) for each swath

                                                                                                                                                      tile

                                                                                                                                                      Query this table to get the

                                                                                                                                                      list of satellite scan times

                                                                                                                                                      that cover a target tile

                                                                                                                                                      Swath Source

                                                                                                                                                      Data Storage

                                                                                                                                                      bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                                                                                      bull Storage costs driven by data scale and 6 month project duration

                                                                                                                                                      bull Small with respect to the people costs even at graduate student rates

                                                                                                                                                      Reduction 1 Queue

                                                                                                                                                      Source Metadata

                                                                                                                                                      Request Queue

                                                                                                                                                      Scientific Results Download

                                                                                                                                                      Data Collection Stage

                                                                                                                                                      Source Imagery Download Sites

                                                                                                                                                      Reprojection Queue

                                                                                                                                                      Reduction 2 Queue

                                                                                                                                                      Download

                                                                                                                                                      Queue

                                                                                                                                                      Scientists

                                                                                                                                                      Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                                      400-500 GB

                                                                                                                                                      60K files

                                                                                                                                                      10 MBsec

                                                                                                                                                      11 hours

                                                                                                                                                      lt10 workers

                                                                                                                                                      $50 upload

                                                                                                                                                      $450 storage

                                                                                                                                                      400 GB

                                                                                                                                                      45K files

                                                                                                                                                      3500 hours

                                                                                                                                                      20-100

                                                                                                                                                      workers

                                                                                                                                                      5-7 GB

                                                                                                                                                      55K files

                                                                                                                                                      1800 hours

                                                                                                                                                      20-100

                                                                                                                                                      workers

                                                                                                                                                      lt10 GB

                                                                                                                                                      ~1K files

                                                                                                                                                      1800 hours

                                                                                                                                                      20-100

                                                                                                                                                      workers

                                                                                                                                                      $420 cpu

                                                                                                                                                      $60 download

                                                                                                                                                      $216 cpu

                                                                                                                                                      $1 download

                                                                                                                                                      $6 storage

                                                                                                                                                      $216 cpu

                                                                                                                                                      $2 download

                                                                                                                                                      $9 storage

                                                                                                                                                      AzureMODIS

                                                                                                                                                      Service Web Role Portal

                                                                                                                                                      Total $1420

                                                                                                                                                      bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                                                                                      potential to be important to both large and small scale science problems

                                                                                                                                                      bull Equally import they can increase participation in research providing needed

                                                                                                                                                      resources to userscommunities without ready access

                                                                                                                                                      bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                                                                                      support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                                                                                      latency applications do not perform optimally on clouds today

                                                                                                                                                      bull Provide valuable fault tolerance and scalability abstractions

                                                                                                                                                      bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                                                                                      bull Clouds services to support research provide considerable leverage for both

                                                                                                                                                      individual researchers and entire communities of researchers

                                                                                                                                                      • Day 2 - Azure as a PaaS
                                                                                                                                                      • Day 2 - Applications

                                                                                                                                                        76

                                                                                                                                                        Project Daytona - Map Reduce on Azure

                                                                                                                                                        httpresearchmicrosoftcomen-usprojectsazuredaytonaaspx

                                                                                                                                                        77

                                                                                                                                                        Questions and Discussionhellip

                                                                                                                                                        Thank you for hosting me at the Summer School

                                                                                                                                                        BLAST (Basic Local Alignment Search Tool)

                                                                                                                                                        bull The most important software in bioinformatics

                                                                                                                                                        bull Identify similarity between bio-sequences

                                                                                                                                                        Computationally intensive

                                                                                                                                                        bull Large number of pairwise alignment operations

                                                                                                                                                        bull A BLAST running can take 700 ~ 1000 CPU hours

                                                                                                                                                        bull Sequence databases growing exponentially

                                                                                                                                                        bull GenBank doubled in size in about 15 months

                                                                                                                                                        It is easy to parallelize BLAST

                                                                                                                                                        bull Segment the input

                                                                                                                                                        bull Segment processing (querying) is pleasingly parallel

                                                                                                                                                        bull Segment the database (eg mpiBLAST)

                                                                                                                                                        bull Needs special result reduction processing

                                                                                                                                                        Large volume data

                                                                                                                                                        bull A normal Blast database can be as large as 10GB

                                                                                                                                                        bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                                                                                                                        bull The output of BLAST is usually 10-100x larger than the input

                                                                                                                                                        bull Parallel BLAST engine on Azure

                                                                                                                                                        bull Query-segmentation data-parallel pattern bull split the input sequences

                                                                                                                                                        bull query partitions in parallel

                                                                                                                                                        bull merge results together when done

                                                                                                                                                        bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                                                                                                                        bull With three special considerations bull Batch job management

                                                                                                                                                        bull Task parallelism on an elastic Cloud

                                                                                                                                                        Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                                                                                                                        on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                                                                                                                        A simple SplitJoin pattern

                                                                                                                                                        Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                                                                                                                        bull 1248 for small middle large and extra large instance size

                                                                                                                                                        Task granularity bull Large partition load imbalance

                                                                                                                                                        bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                                                                                                                        bull Data transferring overhead

                                                                                                                                                        Best Practice test runs to profiling and set size to mitigate the overhead

                                                                                                                                                        Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                                                                                                                        bull too small repeated computation

                                                                                                                                                        bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                                                                                                                        bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                                                                                                                        bull Watch out for the 2-hour maximum limitation

                                                                                                                                                        BLAST task

                                                                                                                                                        Splitting task

                                                                                                                                                        BLAST task

                                                                                                                                                        BLAST task

                                                                                                                                                        BLAST task

                                                                                                                                                        hellip

                                                                                                                                                        Merging Task

                                                                                                                                                        Task size vs Performance

                                                                                                                                                        bull Benefit of the warm cache effect

                                                                                                                                                        bull 100 sequences per partition is the best choice

                                                                                                                                                        Instance size vs Performance

                                                                                                                                                        bull Super-linear speedup with larger size worker instances

                                                                                                                                                        bull Primarily due to the memory capability

                                                                                                                                                        Task SizeInstance Size vs Cost

                                                                                                                                                        bull Extra-large instance generated the best and the most economical throughput

                                                                                                                                                        bull Fully utilize the resource

                                                                                                                                                        Web

                                                                                                                                                        Portal

                                                                                                                                                        Web

                                                                                                                                                        Service

                                                                                                                                                        Job registration

                                                                                                                                                        Job Scheduler

                                                                                                                                                        Worker

                                                                                                                                                        Worker

                                                                                                                                                        Worker

                                                                                                                                                        Global

                                                                                                                                                        dispatch

                                                                                                                                                        queue

                                                                                                                                                        Web Role

                                                                                                                                                        Azure Table

                                                                                                                                                        Job Management Role

                                                                                                                                                        Azure Blob

                                                                                                                                                        Database

                                                                                                                                                        updating Role

                                                                                                                                                        hellip

                                                                                                                                                        Scaling Engine

                                                                                                                                                        Blast databases

                                                                                                                                                        temporary data etc)

                                                                                                                                                        Job Registry NCBI databases

                                                                                                                                                        BLAST task

                                                                                                                                                        Splitting task

                                                                                                                                                        BLAST task

                                                                                                                                                        BLAST task

                                                                                                                                                        BLAST task

                                                                                                                                                        hellip

                                                                                                                                                        Merging Task

                                                                                                                                                        ASPNET program hosted by a web role instance bull Submit jobs

                                                                                                                                                        bull Track jobrsquos status and logs

                                                                                                                                                        AuthenticationAuthorization based on Live ID

                                                                                                                                                        The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                                                                                                                        Web Portal

                                                                                                                                                        Web Service

                                                                                                                                                        Job registration

                                                                                                                                                        Job Scheduler

                                                                                                                                                        Job Portal

                                                                                                                                                        Scaling Engine

                                                                                                                                                        Job Registry

                                                                                                                                                        R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                                                                                                                        Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                                                                                                                        bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                                                                                                                        AzureBLAST significantly saved computing timehellip

                                                                                                                                                        Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                                                                                                                        ldquoAll against Allrdquo query bull The database is also the input query

                                                                                                                                                        bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                                                                                                                        bull Theoretically 100 billion sequence comparisons

                                                                                                                                                        Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                                                                                                                        bull Would require 3216731 minutes (61 years) on one desktop

                                                                                                                                                        One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                                                                                                                        bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                                                                                                        bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                                                                                                        bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                                                                                                        bull Each segment consists of smaller partitions

                                                                                                                                                        bull When load imbalances redistribute the load manually

                                                                                                                                                        5

                                                                                                                                                        0

                                                                                                                                                        62 6

                                                                                                                                                        2 6

                                                                                                                                                        2 6

                                                                                                                                                        2 6

                                                                                                                                                        2 5

                                                                                                                                                        0 62

                                                                                                                                                        bull Total size of the output result is ~230GB

                                                                                                                                                        bull The number of total hits is 1764579487

                                                                                                                                                        bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                                                                                                        bull But based our estimates real working instance time should be 6~8 day

                                                                                                                                                        bull Look into log data to analyze what took placehellip

                                                                                                                                                        5

                                                                                                                                                        0

                                                                                                                                                        62 6

                                                                                                                                                        2 6

                                                                                                                                                        2 6

                                                                                                                                                        2 6

                                                                                                                                                        2 5

                                                                                                                                                        0 62

                                                                                                                                                        A normal log record should be

                                                                                                                                                        Otherwise something is wrong (eg task failed to complete)

                                                                                                                                                        3312010 614 RD00155D3611B0 Executing the task 251523

                                                                                                                                                        3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                                                                                                        3312010 625 RD00155D3611B0 Executing the task 251553

                                                                                                                                                        3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                                                                                                        3312010 644 RD00155D3611B0 Executing the task 251600

                                                                                                                                                        3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                                                                                                        3312010 822 RD00155D3611B0 Executing the task 251774

                                                                                                                                                        3312010 950 RD00155D3611B0 Executing the task 251895

                                                                                                                                                        3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                                                                                                        North Europe Data Center totally 34256 tasks processed

                                                                                                                                                        All 62 compute nodes lost tasks

                                                                                                                                                        and then came back in a group

                                                                                                                                                        This is an Update domain

                                                                                                                                                        ~30 mins

                                                                                                                                                        ~ 6 nodes in one group

                                                                                                                                                        35 Nodes experience blob

                                                                                                                                                        writing failure at same

                                                                                                                                                        time

                                                                                                                                                        West Europe Datacenter 30976 tasks are completed and job was killed

                                                                                                                                                        A reasonable guess the

                                                                                                                                                        Fault Domain is working

                                                                                                                                                        MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                                                                                                        You never miss the water till the well has run dry Irish Proverb

                                                                                                                                                        ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                                                                                                        Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                                                                                                        λv = Latent heat of vaporization (Jg)

                                                                                                                                                        Rn = Net radiation (W m-2)

                                                                                                                                                        cp = Specific heat capacity of air (J kg-1 K-1)

                                                                                                                                                        ρa = dry air density (kg m-3)

                                                                                                                                                        δq = vapor pressure deficit (Pa)

                                                                                                                                                        ga = Conductivity of air (inverse of ra) (m s-1)

                                                                                                                                                        gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                                                                                                        γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                                                                                                        Estimating resistanceconductivity across a

                                                                                                                                                        catchment can be tricky

                                                                                                                                                        bull Lots of inputs big data reduction

                                                                                                                                                        bull Some of the inputs are not so simple

                                                                                                                                                        119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                                                                                                        (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                                                                                                        Penman-Monteith (1964)

                                                                                                                                                        Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                                                                                                        NASA MODIS

                                                                                                                                                        imagery source

                                                                                                                                                        archives

                                                                                                                                                        5 TB (600K files)

                                                                                                                                                        FLUXNET curated

                                                                                                                                                        sensor dataset

                                                                                                                                                        (30GB 960 files)

                                                                                                                                                        FLUXNET curated

                                                                                                                                                        field dataset

                                                                                                                                                        2 KB (1 file)

                                                                                                                                                        NCEPNCAR

                                                                                                                                                        ~100MB

                                                                                                                                                        (4K files)

                                                                                                                                                        Vegetative

                                                                                                                                                        clumping

                                                                                                                                                        ~5MB (1file)

                                                                                                                                                        Climate

                                                                                                                                                        classification

                                                                                                                                                        ~1MB (1file)

                                                                                                                                                        20 US year = 1 global year

                                                                                                                                                        Data collection (map) stage

                                                                                                                                                        bull Downloads requested input tiles

                                                                                                                                                        from NASA ftp sites

                                                                                                                                                        bull Includes geospatial lookup for

                                                                                                                                                        non-sinusoidal tiles that will

                                                                                                                                                        contribute to a reprojected

                                                                                                                                                        sinusoidal tile

                                                                                                                                                        Reprojection (map) stage

                                                                                                                                                        bull Converts source tile(s) to

                                                                                                                                                        intermediate result sinusoidal tiles

                                                                                                                                                        bull Simple nearest neighbor or spline

                                                                                                                                                        algorithms

                                                                                                                                                        Derivation reduction stage

                                                                                                                                                        bull First stage visible to scientist

                                                                                                                                                        bull Computes ET in our initial use

                                                                                                                                                        Analysis reduction stage

                                                                                                                                                        bull Optional second stage visible to

                                                                                                                                                        scientist

                                                                                                                                                        bull Enables production of science

                                                                                                                                                        analysis artifacts such as maps

                                                                                                                                                        tables virtual sensors

                                                                                                                                                        Reduction 1

                                                                                                                                                        Queue

                                                                                                                                                        Source

                                                                                                                                                        Metadata

                                                                                                                                                        AzureMODIS

                                                                                                                                                        Service Web Role Portal

                                                                                                                                                        Request

                                                                                                                                                        Queue

                                                                                                                                                        Scientific

                                                                                                                                                        Results

                                                                                                                                                        Download

                                                                                                                                                        Data Collection Stage

                                                                                                                                                        Source Imagery Download Sites

                                                                                                                                                        Reprojection

                                                                                                                                                        Queue

                                                                                                                                                        Reduction 2

                                                                                                                                                        Queue

                                                                                                                                                        Download

                                                                                                                                                        Queue

                                                                                                                                                        Scientists

                                                                                                                                                        Science results

                                                                                                                                                        Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                                        httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                                                                                        bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                                                                                        bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                                                                                        bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                                                                                        recoverable units of work

                                                                                                                                                        bull Execution status of all jobs and tasks persisted in Tables

                                                                                                                                                        ltPipelineStagegt

                                                                                                                                                        Request

                                                                                                                                                        hellip ltPipelineStagegtJobStatus

                                                                                                                                                        Persist ltPipelineStagegtJob Queue

                                                                                                                                                        MODISAzure Service

                                                                                                                                                        (Web Role)

                                                                                                                                                        Service Monitor

                                                                                                                                                        (Worker Role)

                                                                                                                                                        Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                                        hellip

                                                                                                                                                        Dispatch

                                                                                                                                                        ltPipelineStagegtTask Queue

                                                                                                                                                        All work actually done by a Worker Role

                                                                                                                                                        bull Sandboxes science or other executable

                                                                                                                                                        bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                                                                                        Service Monitor

                                                                                                                                                        (Worker Role)

                                                                                                                                                        Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                                        GenericWorker

                                                                                                                                                        (Worker Role)

                                                                                                                                                        hellip

                                                                                                                                                        hellip

                                                                                                                                                        Dispatch

                                                                                                                                                        ltPipelineStagegtTask Queue

                                                                                                                                                        hellip

                                                                                                                                                        ltInputgtData Storage

                                                                                                                                                        bull Dequeues tasks created by the Service Monitor

                                                                                                                                                        bull Retries failed tasks 3 times

                                                                                                                                                        bull Maintains all task status

                                                                                                                                                        Reprojection Request

                                                                                                                                                        hellip

                                                                                                                                                        Service Monitor

                                                                                                                                                        (Worker Role)

                                                                                                                                                        ReprojectionJobStatus Persist

                                                                                                                                                        Parse amp Persist ReprojectionTaskStatus

                                                                                                                                                        GenericWorker

                                                                                                                                                        (Worker Role)

                                                                                                                                                        hellip

                                                                                                                                                        Job Queue

                                                                                                                                                        hellip

                                                                                                                                                        Dispatch

                                                                                                                                                        Task Queue

                                                                                                                                                        Points to

                                                                                                                                                        hellip

                                                                                                                                                        ScanTimeList

                                                                                                                                                        SwathGranuleMeta

                                                                                                                                                        Reprojection Data

                                                                                                                                                        Storage

                                                                                                                                                        Each entity specifies a

                                                                                                                                                        single reprojection job

                                                                                                                                                        request

                                                                                                                                                        Each entity specifies a

                                                                                                                                                        single reprojection task (ie

                                                                                                                                                        a single tile)

                                                                                                                                                        Query this table to get

                                                                                                                                                        geo-metadata (eg

                                                                                                                                                        boundaries) for each swath

                                                                                                                                                        tile

                                                                                                                                                        Query this table to get the

                                                                                                                                                        list of satellite scan times

                                                                                                                                                        that cover a target tile

                                                                                                                                                        Swath Source

                                                                                                                                                        Data Storage

                                                                                                                                                        bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                                                                                        bull Storage costs driven by data scale and 6 month project duration

                                                                                                                                                        bull Small with respect to the people costs even at graduate student rates

                                                                                                                                                        Reduction 1 Queue

                                                                                                                                                        Source Metadata

                                                                                                                                                        Request Queue

                                                                                                                                                        Scientific Results Download

                                                                                                                                                        Data Collection Stage

                                                                                                                                                        Source Imagery Download Sites

                                                                                                                                                        Reprojection Queue

                                                                                                                                                        Reduction 2 Queue

                                                                                                                                                        Download

                                                                                                                                                        Queue

                                                                                                                                                        Scientists

                                                                                                                                                        Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                                        400-500 GB

                                                                                                                                                        60K files

                                                                                                                                                        10 MBsec

                                                                                                                                                        11 hours

                                                                                                                                                        lt10 workers

                                                                                                                                                        $50 upload

                                                                                                                                                        $450 storage

                                                                                                                                                        400 GB

                                                                                                                                                        45K files

                                                                                                                                                        3500 hours

                                                                                                                                                        20-100

                                                                                                                                                        workers

                                                                                                                                                        5-7 GB

                                                                                                                                                        55K files

                                                                                                                                                        1800 hours

                                                                                                                                                        20-100

                                                                                                                                                        workers

                                                                                                                                                        lt10 GB

                                                                                                                                                        ~1K files

                                                                                                                                                        1800 hours

                                                                                                                                                        20-100

                                                                                                                                                        workers

                                                                                                                                                        $420 cpu

                                                                                                                                                        $60 download

                                                                                                                                                        $216 cpu

                                                                                                                                                        $1 download

                                                                                                                                                        $6 storage

                                                                                                                                                        $216 cpu

                                                                                                                                                        $2 download

                                                                                                                                                        $9 storage

                                                                                                                                                        AzureMODIS

                                                                                                                                                        Service Web Role Portal

                                                                                                                                                        Total $1420

                                                                                                                                                        bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                                                                                        potential to be important to both large and small scale science problems

                                                                                                                                                        bull Equally import they can increase participation in research providing needed

                                                                                                                                                        resources to userscommunities without ready access

                                                                                                                                                        bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                                                                                        support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                                                                                        latency applications do not perform optimally on clouds today

                                                                                                                                                        bull Provide valuable fault tolerance and scalability abstractions

                                                                                                                                                        bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                                                                                        bull Clouds services to support research provide considerable leverage for both

                                                                                                                                                        individual researchers and entire communities of researchers

                                                                                                                                                        • Day 2 - Azure as a PaaS
                                                                                                                                                        • Day 2 - Applications

                                                                                                                                                          77

                                                                                                                                                          Questions and Discussionhellip

                                                                                                                                                          Thank you for hosting me at the Summer School

                                                                                                                                                          BLAST (Basic Local Alignment Search Tool)

                                                                                                                                                          bull The most important software in bioinformatics

                                                                                                                                                          bull Identify similarity between bio-sequences

                                                                                                                                                          Computationally intensive

                                                                                                                                                          bull Large number of pairwise alignment operations

                                                                                                                                                          bull A BLAST running can take 700 ~ 1000 CPU hours

                                                                                                                                                          bull Sequence databases growing exponentially

                                                                                                                                                          bull GenBank doubled in size in about 15 months

                                                                                                                                                          It is easy to parallelize BLAST

                                                                                                                                                          bull Segment the input

                                                                                                                                                          bull Segment processing (querying) is pleasingly parallel

                                                                                                                                                          bull Segment the database (eg mpiBLAST)

                                                                                                                                                          bull Needs special result reduction processing

                                                                                                                                                          Large volume data

                                                                                                                                                          bull A normal Blast database can be as large as 10GB

                                                                                                                                                          bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                                                                                                                          bull The output of BLAST is usually 10-100x larger than the input

                                                                                                                                                          bull Parallel BLAST engine on Azure

                                                                                                                                                          bull Query-segmentation data-parallel pattern bull split the input sequences

                                                                                                                                                          bull query partitions in parallel

                                                                                                                                                          bull merge results together when done

                                                                                                                                                          bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                                                                                                                          bull With three special considerations bull Batch job management

                                                                                                                                                          bull Task parallelism on an elastic Cloud

                                                                                                                                                          Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                                                                                                                          on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                                                                                                                          A simple SplitJoin pattern

                                                                                                                                                          Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                                                                                                                          bull 1248 for small middle large and extra large instance size

                                                                                                                                                          Task granularity bull Large partition load imbalance

                                                                                                                                                          bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                                                                                                                          bull Data transferring overhead

                                                                                                                                                          Best Practice test runs to profiling and set size to mitigate the overhead

                                                                                                                                                          Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                                                                                                                          bull too small repeated computation

                                                                                                                                                          bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                                                                                                                          bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                                                                                                                          bull Watch out for the 2-hour maximum limitation

                                                                                                                                                          BLAST task

                                                                                                                                                          Splitting task

                                                                                                                                                          BLAST task

                                                                                                                                                          BLAST task

                                                                                                                                                          BLAST task

                                                                                                                                                          hellip

                                                                                                                                                          Merging Task

                                                                                                                                                          Task size vs Performance

                                                                                                                                                          bull Benefit of the warm cache effect

                                                                                                                                                          bull 100 sequences per partition is the best choice

                                                                                                                                                          Instance size vs Performance

                                                                                                                                                          bull Super-linear speedup with larger size worker instances

                                                                                                                                                          bull Primarily due to the memory capability

                                                                                                                                                          Task SizeInstance Size vs Cost

                                                                                                                                                          bull Extra-large instance generated the best and the most economical throughput

                                                                                                                                                          bull Fully utilize the resource

                                                                                                                                                          Web

                                                                                                                                                          Portal

                                                                                                                                                          Web

                                                                                                                                                          Service

                                                                                                                                                          Job registration

                                                                                                                                                          Job Scheduler

                                                                                                                                                          Worker

                                                                                                                                                          Worker

                                                                                                                                                          Worker

                                                                                                                                                          Global

                                                                                                                                                          dispatch

                                                                                                                                                          queue

                                                                                                                                                          Web Role

                                                                                                                                                          Azure Table

                                                                                                                                                          Job Management Role

                                                                                                                                                          Azure Blob

                                                                                                                                                          Database

                                                                                                                                                          updating Role

                                                                                                                                                          hellip

                                                                                                                                                          Scaling Engine

                                                                                                                                                          Blast databases

                                                                                                                                                          temporary data etc)

                                                                                                                                                          Job Registry NCBI databases

                                                                                                                                                          BLAST task

                                                                                                                                                          Splitting task

                                                                                                                                                          BLAST task

                                                                                                                                                          BLAST task

                                                                                                                                                          BLAST task

                                                                                                                                                          hellip

                                                                                                                                                          Merging Task

                                                                                                                                                          ASPNET program hosted by a web role instance bull Submit jobs

                                                                                                                                                          bull Track jobrsquos status and logs

                                                                                                                                                          AuthenticationAuthorization based on Live ID

                                                                                                                                                          The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                                                                                                                          Web Portal

                                                                                                                                                          Web Service

                                                                                                                                                          Job registration

                                                                                                                                                          Job Scheduler

                                                                                                                                                          Job Portal

                                                                                                                                                          Scaling Engine

                                                                                                                                                          Job Registry

                                                                                                                                                          R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                                                                                                                          Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                                                                                                                          bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                                                                                                                          AzureBLAST significantly saved computing timehellip

                                                                                                                                                          Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                                                                                                                          ldquoAll against Allrdquo query bull The database is also the input query

                                                                                                                                                          bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                                                                                                                          bull Theoretically 100 billion sequence comparisons

                                                                                                                                                          Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                                                                                                                          bull Would require 3216731 minutes (61 years) on one desktop

                                                                                                                                                          One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                                                                                                                          bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                                                                                                          bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                                                                                                          bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                                                                                                          bull Each segment consists of smaller partitions

                                                                                                                                                          bull When load imbalances redistribute the load manually

                                                                                                                                                          5

                                                                                                                                                          0

                                                                                                                                                          62 6

                                                                                                                                                          2 6

                                                                                                                                                          2 6

                                                                                                                                                          2 6

                                                                                                                                                          2 5

                                                                                                                                                          0 62

                                                                                                                                                          bull Total size of the output result is ~230GB

                                                                                                                                                          bull The number of total hits is 1764579487

                                                                                                                                                          bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                                                                                                          bull But based our estimates real working instance time should be 6~8 day

                                                                                                                                                          bull Look into log data to analyze what took placehellip

                                                                                                                                                          5

                                                                                                                                                          0

                                                                                                                                                          62 6

                                                                                                                                                          2 6

                                                                                                                                                          2 6

                                                                                                                                                          2 6

                                                                                                                                                          2 5

                                                                                                                                                          0 62

                                                                                                                                                          A normal log record should be

                                                                                                                                                          Otherwise something is wrong (eg task failed to complete)

                                                                                                                                                          3312010 614 RD00155D3611B0 Executing the task 251523

                                                                                                                                                          3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                                                                                                          3312010 625 RD00155D3611B0 Executing the task 251553

                                                                                                                                                          3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                                                                                                          3312010 644 RD00155D3611B0 Executing the task 251600

                                                                                                                                                          3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                                                                                                          3312010 822 RD00155D3611B0 Executing the task 251774

                                                                                                                                                          3312010 950 RD00155D3611B0 Executing the task 251895

                                                                                                                                                          3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                                                                                                          North Europe Data Center totally 34256 tasks processed

                                                                                                                                                          All 62 compute nodes lost tasks

                                                                                                                                                          and then came back in a group

                                                                                                                                                          This is an Update domain

                                                                                                                                                          ~30 mins

                                                                                                                                                          ~ 6 nodes in one group

                                                                                                                                                          35 Nodes experience blob

                                                                                                                                                          writing failure at same

                                                                                                                                                          time

                                                                                                                                                          West Europe Datacenter 30976 tasks are completed and job was killed

                                                                                                                                                          A reasonable guess the

                                                                                                                                                          Fault Domain is working

                                                                                                                                                          MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                                                                                                          You never miss the water till the well has run dry Irish Proverb

                                                                                                                                                          ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                                                                                                          Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                                                                                                          λv = Latent heat of vaporization (Jg)

                                                                                                                                                          Rn = Net radiation (W m-2)

                                                                                                                                                          cp = Specific heat capacity of air (J kg-1 K-1)

                                                                                                                                                          ρa = dry air density (kg m-3)

                                                                                                                                                          δq = vapor pressure deficit (Pa)

                                                                                                                                                          ga = Conductivity of air (inverse of ra) (m s-1)

                                                                                                                                                          gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                                                                                                          γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                                                                                                          Estimating resistanceconductivity across a

                                                                                                                                                          catchment can be tricky

                                                                                                                                                          bull Lots of inputs big data reduction

                                                                                                                                                          bull Some of the inputs are not so simple

                                                                                                                                                          119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                                                                                                          (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                                                                                                          Penman-Monteith (1964)

                                                                                                                                                          Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                                                                                                          NASA MODIS

                                                                                                                                                          imagery source

                                                                                                                                                          archives

                                                                                                                                                          5 TB (600K files)

                                                                                                                                                          FLUXNET curated

                                                                                                                                                          sensor dataset

                                                                                                                                                          (30GB 960 files)

                                                                                                                                                          FLUXNET curated

                                                                                                                                                          field dataset

                                                                                                                                                          2 KB (1 file)

                                                                                                                                                          NCEPNCAR

                                                                                                                                                          ~100MB

                                                                                                                                                          (4K files)

                                                                                                                                                          Vegetative

                                                                                                                                                          clumping

                                                                                                                                                          ~5MB (1file)

                                                                                                                                                          Climate

                                                                                                                                                          classification

                                                                                                                                                          ~1MB (1file)

                                                                                                                                                          20 US year = 1 global year

                                                                                                                                                          Data collection (map) stage

                                                                                                                                                          bull Downloads requested input tiles

                                                                                                                                                          from NASA ftp sites

                                                                                                                                                          bull Includes geospatial lookup for

                                                                                                                                                          non-sinusoidal tiles that will

                                                                                                                                                          contribute to a reprojected

                                                                                                                                                          sinusoidal tile

                                                                                                                                                          Reprojection (map) stage

                                                                                                                                                          bull Converts source tile(s) to

                                                                                                                                                          intermediate result sinusoidal tiles

                                                                                                                                                          bull Simple nearest neighbor or spline

                                                                                                                                                          algorithms

                                                                                                                                                          Derivation reduction stage

                                                                                                                                                          bull First stage visible to scientist

                                                                                                                                                          bull Computes ET in our initial use

                                                                                                                                                          Analysis reduction stage

                                                                                                                                                          bull Optional second stage visible to

                                                                                                                                                          scientist

                                                                                                                                                          bull Enables production of science

                                                                                                                                                          analysis artifacts such as maps

                                                                                                                                                          tables virtual sensors

                                                                                                                                                          Reduction 1

                                                                                                                                                          Queue

                                                                                                                                                          Source

                                                                                                                                                          Metadata

                                                                                                                                                          AzureMODIS

                                                                                                                                                          Service Web Role Portal

                                                                                                                                                          Request

                                                                                                                                                          Queue

                                                                                                                                                          Scientific

                                                                                                                                                          Results

                                                                                                                                                          Download

                                                                                                                                                          Data Collection Stage

                                                                                                                                                          Source Imagery Download Sites

                                                                                                                                                          Reprojection

                                                                                                                                                          Queue

                                                                                                                                                          Reduction 2

                                                                                                                                                          Queue

                                                                                                                                                          Download

                                                                                                                                                          Queue

                                                                                                                                                          Scientists

                                                                                                                                                          Science results

                                                                                                                                                          Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                                          httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                                                                                          bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                                                                                          bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                                                                                          bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                                                                                          recoverable units of work

                                                                                                                                                          bull Execution status of all jobs and tasks persisted in Tables

                                                                                                                                                          ltPipelineStagegt

                                                                                                                                                          Request

                                                                                                                                                          hellip ltPipelineStagegtJobStatus

                                                                                                                                                          Persist ltPipelineStagegtJob Queue

                                                                                                                                                          MODISAzure Service

                                                                                                                                                          (Web Role)

                                                                                                                                                          Service Monitor

                                                                                                                                                          (Worker Role)

                                                                                                                                                          Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                                          hellip

                                                                                                                                                          Dispatch

                                                                                                                                                          ltPipelineStagegtTask Queue

                                                                                                                                                          All work actually done by a Worker Role

                                                                                                                                                          bull Sandboxes science or other executable

                                                                                                                                                          bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                                                                                          Service Monitor

                                                                                                                                                          (Worker Role)

                                                                                                                                                          Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                                          GenericWorker

                                                                                                                                                          (Worker Role)

                                                                                                                                                          hellip

                                                                                                                                                          hellip

                                                                                                                                                          Dispatch

                                                                                                                                                          ltPipelineStagegtTask Queue

                                                                                                                                                          hellip

                                                                                                                                                          ltInputgtData Storage

                                                                                                                                                          bull Dequeues tasks created by the Service Monitor

                                                                                                                                                          bull Retries failed tasks 3 times

                                                                                                                                                          bull Maintains all task status

                                                                                                                                                          Reprojection Request

                                                                                                                                                          hellip

                                                                                                                                                          Service Monitor

                                                                                                                                                          (Worker Role)

                                                                                                                                                          ReprojectionJobStatus Persist

                                                                                                                                                          Parse amp Persist ReprojectionTaskStatus

                                                                                                                                                          GenericWorker

                                                                                                                                                          (Worker Role)

                                                                                                                                                          hellip

                                                                                                                                                          Job Queue

                                                                                                                                                          hellip

                                                                                                                                                          Dispatch

                                                                                                                                                          Task Queue

                                                                                                                                                          Points to

                                                                                                                                                          hellip

                                                                                                                                                          ScanTimeList

                                                                                                                                                          SwathGranuleMeta

                                                                                                                                                          Reprojection Data

                                                                                                                                                          Storage

                                                                                                                                                          Each entity specifies a

                                                                                                                                                          single reprojection job

                                                                                                                                                          request

                                                                                                                                                          Each entity specifies a

                                                                                                                                                          single reprojection task (ie

                                                                                                                                                          a single tile)

                                                                                                                                                          Query this table to get

                                                                                                                                                          geo-metadata (eg

                                                                                                                                                          boundaries) for each swath

                                                                                                                                                          tile

                                                                                                                                                          Query this table to get the

                                                                                                                                                          list of satellite scan times

                                                                                                                                                          that cover a target tile

                                                                                                                                                          Swath Source

                                                                                                                                                          Data Storage

                                                                                                                                                          bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                                                                                          bull Storage costs driven by data scale and 6 month project duration

                                                                                                                                                          bull Small with respect to the people costs even at graduate student rates

                                                                                                                                                          Reduction 1 Queue

                                                                                                                                                          Source Metadata

                                                                                                                                                          Request Queue

                                                                                                                                                          Scientific Results Download

                                                                                                                                                          Data Collection Stage

                                                                                                                                                          Source Imagery Download Sites

                                                                                                                                                          Reprojection Queue

                                                                                                                                                          Reduction 2 Queue

                                                                                                                                                          Download

                                                                                                                                                          Queue

                                                                                                                                                          Scientists

                                                                                                                                                          Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                                          400-500 GB

                                                                                                                                                          60K files

                                                                                                                                                          10 MBsec

                                                                                                                                                          11 hours

                                                                                                                                                          lt10 workers

                                                                                                                                                          $50 upload

                                                                                                                                                          $450 storage

                                                                                                                                                          400 GB

                                                                                                                                                          45K files

                                                                                                                                                          3500 hours

                                                                                                                                                          20-100

                                                                                                                                                          workers

                                                                                                                                                          5-7 GB

                                                                                                                                                          55K files

                                                                                                                                                          1800 hours

                                                                                                                                                          20-100

                                                                                                                                                          workers

                                                                                                                                                          lt10 GB

                                                                                                                                                          ~1K files

                                                                                                                                                          1800 hours

                                                                                                                                                          20-100

                                                                                                                                                          workers

                                                                                                                                                          $420 cpu

                                                                                                                                                          $60 download

                                                                                                                                                          $216 cpu

                                                                                                                                                          $1 download

                                                                                                                                                          $6 storage

                                                                                                                                                          $216 cpu

                                                                                                                                                          $2 download

                                                                                                                                                          $9 storage

                                                                                                                                                          AzureMODIS

                                                                                                                                                          Service Web Role Portal

                                                                                                                                                          Total $1420

                                                                                                                                                          bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                                                                                          potential to be important to both large and small scale science problems

                                                                                                                                                          bull Equally import they can increase participation in research providing needed

                                                                                                                                                          resources to userscommunities without ready access

                                                                                                                                                          bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                                                                                          support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                                                                                          latency applications do not perform optimally on clouds today

                                                                                                                                                          bull Provide valuable fault tolerance and scalability abstractions

                                                                                                                                                          bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                                                                                          bull Clouds services to support research provide considerable leverage for both

                                                                                                                                                          individual researchers and entire communities of researchers

                                                                                                                                                          • Day 2 - Azure as a PaaS
                                                                                                                                                          • Day 2 - Applications

                                                                                                                                                            BLAST (Basic Local Alignment Search Tool)

                                                                                                                                                            bull The most important software in bioinformatics

                                                                                                                                                            bull Identify similarity between bio-sequences

                                                                                                                                                            Computationally intensive

                                                                                                                                                            bull Large number of pairwise alignment operations

                                                                                                                                                            bull A BLAST running can take 700 ~ 1000 CPU hours

                                                                                                                                                            bull Sequence databases growing exponentially

                                                                                                                                                            bull GenBank doubled in size in about 15 months

                                                                                                                                                            It is easy to parallelize BLAST

                                                                                                                                                            bull Segment the input

                                                                                                                                                            bull Segment processing (querying) is pleasingly parallel

                                                                                                                                                            bull Segment the database (eg mpiBLAST)

                                                                                                                                                            bull Needs special result reduction processing

                                                                                                                                                            Large volume data

                                                                                                                                                            bull A normal Blast database can be as large as 10GB

                                                                                                                                                            bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                                                                                                                            bull The output of BLAST is usually 10-100x larger than the input

                                                                                                                                                            bull Parallel BLAST engine on Azure

                                                                                                                                                            bull Query-segmentation data-parallel pattern bull split the input sequences

                                                                                                                                                            bull query partitions in parallel

                                                                                                                                                            bull merge results together when done

                                                                                                                                                            bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                                                                                                                            bull With three special considerations bull Batch job management

                                                                                                                                                            bull Task parallelism on an elastic Cloud

                                                                                                                                                            Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                                                                                                                            on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                                                                                                                            A simple SplitJoin pattern

                                                                                                                                                            Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                                                                                                                            bull 1248 for small middle large and extra large instance size

                                                                                                                                                            Task granularity bull Large partition load imbalance

                                                                                                                                                            bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                                                                                                                            bull Data transferring overhead

                                                                                                                                                            Best Practice test runs to profiling and set size to mitigate the overhead

                                                                                                                                                            Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                                                                                                                            bull too small repeated computation

                                                                                                                                                            bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                                                                                                                            bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                                                                                                                            bull Watch out for the 2-hour maximum limitation

                                                                                                                                                            BLAST task

                                                                                                                                                            Splitting task

                                                                                                                                                            BLAST task

                                                                                                                                                            BLAST task

                                                                                                                                                            BLAST task

                                                                                                                                                            hellip

                                                                                                                                                            Merging Task

                                                                                                                                                            Task size vs Performance

                                                                                                                                                            bull Benefit of the warm cache effect

                                                                                                                                                            bull 100 sequences per partition is the best choice

                                                                                                                                                            Instance size vs Performance

                                                                                                                                                            bull Super-linear speedup with larger size worker instances

                                                                                                                                                            bull Primarily due to the memory capability

                                                                                                                                                            Task SizeInstance Size vs Cost

                                                                                                                                                            bull Extra-large instance generated the best and the most economical throughput

                                                                                                                                                            bull Fully utilize the resource

                                                                                                                                                            Web

                                                                                                                                                            Portal

                                                                                                                                                            Web

                                                                                                                                                            Service

                                                                                                                                                            Job registration

                                                                                                                                                            Job Scheduler

                                                                                                                                                            Worker

                                                                                                                                                            Worker

                                                                                                                                                            Worker

                                                                                                                                                            Global

                                                                                                                                                            dispatch

                                                                                                                                                            queue

                                                                                                                                                            Web Role

                                                                                                                                                            Azure Table

                                                                                                                                                            Job Management Role

                                                                                                                                                            Azure Blob

                                                                                                                                                            Database

                                                                                                                                                            updating Role

                                                                                                                                                            hellip

                                                                                                                                                            Scaling Engine

                                                                                                                                                            Blast databases

                                                                                                                                                            temporary data etc)

                                                                                                                                                            Job Registry NCBI databases

                                                                                                                                                            BLAST task

                                                                                                                                                            Splitting task

                                                                                                                                                            BLAST task

                                                                                                                                                            BLAST task

                                                                                                                                                            BLAST task

                                                                                                                                                            hellip

                                                                                                                                                            Merging Task

                                                                                                                                                            ASPNET program hosted by a web role instance bull Submit jobs

                                                                                                                                                            bull Track jobrsquos status and logs

                                                                                                                                                            AuthenticationAuthorization based on Live ID

                                                                                                                                                            The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                                                                                                                            Web Portal

                                                                                                                                                            Web Service

                                                                                                                                                            Job registration

                                                                                                                                                            Job Scheduler

                                                                                                                                                            Job Portal

                                                                                                                                                            Scaling Engine

                                                                                                                                                            Job Registry

                                                                                                                                                            R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                                                                                                                            Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                                                                                                                            bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                                                                                                                            AzureBLAST significantly saved computing timehellip

                                                                                                                                                            Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                                                                                                                            ldquoAll against Allrdquo query bull The database is also the input query

                                                                                                                                                            bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                                                                                                                            bull Theoretically 100 billion sequence comparisons

                                                                                                                                                            Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                                                                                                                            bull Would require 3216731 minutes (61 years) on one desktop

                                                                                                                                                            One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                                                                                                                            bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                                                                                                            bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                                                                                                            bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                                                                                                            bull Each segment consists of smaller partitions

                                                                                                                                                            bull When load imbalances redistribute the load manually

                                                                                                                                                            5

                                                                                                                                                            0

                                                                                                                                                            62 6

                                                                                                                                                            2 6

                                                                                                                                                            2 6

                                                                                                                                                            2 6

                                                                                                                                                            2 5

                                                                                                                                                            0 62

                                                                                                                                                            bull Total size of the output result is ~230GB

                                                                                                                                                            bull The number of total hits is 1764579487

                                                                                                                                                            bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                                                                                                            bull But based our estimates real working instance time should be 6~8 day

                                                                                                                                                            bull Look into log data to analyze what took placehellip

                                                                                                                                                            5

                                                                                                                                                            0

                                                                                                                                                            62 6

                                                                                                                                                            2 6

                                                                                                                                                            2 6

                                                                                                                                                            2 6

                                                                                                                                                            2 5

                                                                                                                                                            0 62

                                                                                                                                                            A normal log record should be

                                                                                                                                                            Otherwise something is wrong (eg task failed to complete)

                                                                                                                                                            3312010 614 RD00155D3611B0 Executing the task 251523

                                                                                                                                                            3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                                                                                                            3312010 625 RD00155D3611B0 Executing the task 251553

                                                                                                                                                            3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                                                                                                            3312010 644 RD00155D3611B0 Executing the task 251600

                                                                                                                                                            3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                                                                                                            3312010 822 RD00155D3611B0 Executing the task 251774

                                                                                                                                                            3312010 950 RD00155D3611B0 Executing the task 251895

                                                                                                                                                            3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                                                                                                            North Europe Data Center totally 34256 tasks processed

                                                                                                                                                            All 62 compute nodes lost tasks

                                                                                                                                                            and then came back in a group

                                                                                                                                                            This is an Update domain

                                                                                                                                                            ~30 mins

                                                                                                                                                            ~ 6 nodes in one group

                                                                                                                                                            35 Nodes experience blob

                                                                                                                                                            writing failure at same

                                                                                                                                                            time

                                                                                                                                                            West Europe Datacenter 30976 tasks are completed and job was killed

                                                                                                                                                            A reasonable guess the

                                                                                                                                                            Fault Domain is working

                                                                                                                                                            MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                                                                                                            You never miss the water till the well has run dry Irish Proverb

                                                                                                                                                            ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                                                                                                            Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                                                                                                            λv = Latent heat of vaporization (Jg)

                                                                                                                                                            Rn = Net radiation (W m-2)

                                                                                                                                                            cp = Specific heat capacity of air (J kg-1 K-1)

                                                                                                                                                            ρa = dry air density (kg m-3)

                                                                                                                                                            δq = vapor pressure deficit (Pa)

                                                                                                                                                            ga = Conductivity of air (inverse of ra) (m s-1)

                                                                                                                                                            gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                                                                                                            γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                                                                                                            Estimating resistanceconductivity across a

                                                                                                                                                            catchment can be tricky

                                                                                                                                                            bull Lots of inputs big data reduction

                                                                                                                                                            bull Some of the inputs are not so simple

                                                                                                                                                            119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                                                                                                            (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                                                                                                            Penman-Monteith (1964)

                                                                                                                                                            Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                                                                                                            NASA MODIS

                                                                                                                                                            imagery source

                                                                                                                                                            archives

                                                                                                                                                            5 TB (600K files)

                                                                                                                                                            FLUXNET curated

                                                                                                                                                            sensor dataset

                                                                                                                                                            (30GB 960 files)

                                                                                                                                                            FLUXNET curated

                                                                                                                                                            field dataset

                                                                                                                                                            2 KB (1 file)

                                                                                                                                                            NCEPNCAR

                                                                                                                                                            ~100MB

                                                                                                                                                            (4K files)

                                                                                                                                                            Vegetative

                                                                                                                                                            clumping

                                                                                                                                                            ~5MB (1file)

                                                                                                                                                            Climate

                                                                                                                                                            classification

                                                                                                                                                            ~1MB (1file)

                                                                                                                                                            20 US year = 1 global year

                                                                                                                                                            Data collection (map) stage

                                                                                                                                                            bull Downloads requested input tiles

                                                                                                                                                            from NASA ftp sites

                                                                                                                                                            bull Includes geospatial lookup for

                                                                                                                                                            non-sinusoidal tiles that will

                                                                                                                                                            contribute to a reprojected

                                                                                                                                                            sinusoidal tile

                                                                                                                                                            Reprojection (map) stage

                                                                                                                                                            bull Converts source tile(s) to

                                                                                                                                                            intermediate result sinusoidal tiles

                                                                                                                                                            bull Simple nearest neighbor or spline

                                                                                                                                                            algorithms

                                                                                                                                                            Derivation reduction stage

                                                                                                                                                            bull First stage visible to scientist

                                                                                                                                                            bull Computes ET in our initial use

                                                                                                                                                            Analysis reduction stage

                                                                                                                                                            bull Optional second stage visible to

                                                                                                                                                            scientist

                                                                                                                                                            bull Enables production of science

                                                                                                                                                            analysis artifacts such as maps

                                                                                                                                                            tables virtual sensors

                                                                                                                                                            Reduction 1

                                                                                                                                                            Queue

                                                                                                                                                            Source

                                                                                                                                                            Metadata

                                                                                                                                                            AzureMODIS

                                                                                                                                                            Service Web Role Portal

                                                                                                                                                            Request

                                                                                                                                                            Queue

                                                                                                                                                            Scientific

                                                                                                                                                            Results

                                                                                                                                                            Download

                                                                                                                                                            Data Collection Stage

                                                                                                                                                            Source Imagery Download Sites

                                                                                                                                                            Reprojection

                                                                                                                                                            Queue

                                                                                                                                                            Reduction 2

                                                                                                                                                            Queue

                                                                                                                                                            Download

                                                                                                                                                            Queue

                                                                                                                                                            Scientists

                                                                                                                                                            Science results

                                                                                                                                                            Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                                            httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                                                                                            bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                                                                                            bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                                                                                            bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                                                                                            recoverable units of work

                                                                                                                                                            bull Execution status of all jobs and tasks persisted in Tables

                                                                                                                                                            ltPipelineStagegt

                                                                                                                                                            Request

                                                                                                                                                            hellip ltPipelineStagegtJobStatus

                                                                                                                                                            Persist ltPipelineStagegtJob Queue

                                                                                                                                                            MODISAzure Service

                                                                                                                                                            (Web Role)

                                                                                                                                                            Service Monitor

                                                                                                                                                            (Worker Role)

                                                                                                                                                            Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                                            hellip

                                                                                                                                                            Dispatch

                                                                                                                                                            ltPipelineStagegtTask Queue

                                                                                                                                                            All work actually done by a Worker Role

                                                                                                                                                            bull Sandboxes science or other executable

                                                                                                                                                            bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                                                                                            Service Monitor

                                                                                                                                                            (Worker Role)

                                                                                                                                                            Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                                            GenericWorker

                                                                                                                                                            (Worker Role)

                                                                                                                                                            hellip

                                                                                                                                                            hellip

                                                                                                                                                            Dispatch

                                                                                                                                                            ltPipelineStagegtTask Queue

                                                                                                                                                            hellip

                                                                                                                                                            ltInputgtData Storage

                                                                                                                                                            bull Dequeues tasks created by the Service Monitor

                                                                                                                                                            bull Retries failed tasks 3 times

                                                                                                                                                            bull Maintains all task status

                                                                                                                                                            Reprojection Request

                                                                                                                                                            hellip

                                                                                                                                                            Service Monitor

                                                                                                                                                            (Worker Role)

                                                                                                                                                            ReprojectionJobStatus Persist

                                                                                                                                                            Parse amp Persist ReprojectionTaskStatus

                                                                                                                                                            GenericWorker

                                                                                                                                                            (Worker Role)

                                                                                                                                                            hellip

                                                                                                                                                            Job Queue

                                                                                                                                                            hellip

                                                                                                                                                            Dispatch

                                                                                                                                                            Task Queue

                                                                                                                                                            Points to

                                                                                                                                                            hellip

                                                                                                                                                            ScanTimeList

                                                                                                                                                            SwathGranuleMeta

                                                                                                                                                            Reprojection Data

                                                                                                                                                            Storage

                                                                                                                                                            Each entity specifies a

                                                                                                                                                            single reprojection job

                                                                                                                                                            request

                                                                                                                                                            Each entity specifies a

                                                                                                                                                            single reprojection task (ie

                                                                                                                                                            a single tile)

                                                                                                                                                            Query this table to get

                                                                                                                                                            geo-metadata (eg

                                                                                                                                                            boundaries) for each swath

                                                                                                                                                            tile

                                                                                                                                                            Query this table to get the

                                                                                                                                                            list of satellite scan times

                                                                                                                                                            that cover a target tile

                                                                                                                                                            Swath Source

                                                                                                                                                            Data Storage

                                                                                                                                                            bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                                                                                            bull Storage costs driven by data scale and 6 month project duration

                                                                                                                                                            bull Small with respect to the people costs even at graduate student rates

                                                                                                                                                            Reduction 1 Queue

                                                                                                                                                            Source Metadata

                                                                                                                                                            Request Queue

                                                                                                                                                            Scientific Results Download

                                                                                                                                                            Data Collection Stage

                                                                                                                                                            Source Imagery Download Sites

                                                                                                                                                            Reprojection Queue

                                                                                                                                                            Reduction 2 Queue

                                                                                                                                                            Download

                                                                                                                                                            Queue

                                                                                                                                                            Scientists

                                                                                                                                                            Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                                            400-500 GB

                                                                                                                                                            60K files

                                                                                                                                                            10 MBsec

                                                                                                                                                            11 hours

                                                                                                                                                            lt10 workers

                                                                                                                                                            $50 upload

                                                                                                                                                            $450 storage

                                                                                                                                                            400 GB

                                                                                                                                                            45K files

                                                                                                                                                            3500 hours

                                                                                                                                                            20-100

                                                                                                                                                            workers

                                                                                                                                                            5-7 GB

                                                                                                                                                            55K files

                                                                                                                                                            1800 hours

                                                                                                                                                            20-100

                                                                                                                                                            workers

                                                                                                                                                            lt10 GB

                                                                                                                                                            ~1K files

                                                                                                                                                            1800 hours

                                                                                                                                                            20-100

                                                                                                                                                            workers

                                                                                                                                                            $420 cpu

                                                                                                                                                            $60 download

                                                                                                                                                            $216 cpu

                                                                                                                                                            $1 download

                                                                                                                                                            $6 storage

                                                                                                                                                            $216 cpu

                                                                                                                                                            $2 download

                                                                                                                                                            $9 storage

                                                                                                                                                            AzureMODIS

                                                                                                                                                            Service Web Role Portal

                                                                                                                                                            Total $1420

                                                                                                                                                            bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                                                                                            potential to be important to both large and small scale science problems

                                                                                                                                                            bull Equally import they can increase participation in research providing needed

                                                                                                                                                            resources to userscommunities without ready access

                                                                                                                                                            bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                                                                                            support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                                                                                            latency applications do not perform optimally on clouds today

                                                                                                                                                            bull Provide valuable fault tolerance and scalability abstractions

                                                                                                                                                            bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                                                                                            bull Clouds services to support research provide considerable leverage for both

                                                                                                                                                            individual researchers and entire communities of researchers

                                                                                                                                                            • Day 2 - Azure as a PaaS
                                                                                                                                                            • Day 2 - Applications

                                                                                                                                                              It is easy to parallelize BLAST

                                                                                                                                                              bull Segment the input

                                                                                                                                                              bull Segment processing (querying) is pleasingly parallel

                                                                                                                                                              bull Segment the database (eg mpiBLAST)

                                                                                                                                                              bull Needs special result reduction processing

                                                                                                                                                              Large volume data

                                                                                                                                                              bull A normal Blast database can be as large as 10GB

                                                                                                                                                              bull 100 nodes means the peak storage bandwidth could reach to 1TB

                                                                                                                                                              bull The output of BLAST is usually 10-100x larger than the input

                                                                                                                                                              bull Parallel BLAST engine on Azure

                                                                                                                                                              bull Query-segmentation data-parallel pattern bull split the input sequences

                                                                                                                                                              bull query partitions in parallel

                                                                                                                                                              bull merge results together when done

                                                                                                                                                              bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                                                                                                                              bull With three special considerations bull Batch job management

                                                                                                                                                              bull Task parallelism on an elastic Cloud

                                                                                                                                                              Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                                                                                                                              on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                                                                                                                              A simple SplitJoin pattern

                                                                                                                                                              Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                                                                                                                              bull 1248 for small middle large and extra large instance size

                                                                                                                                                              Task granularity bull Large partition load imbalance

                                                                                                                                                              bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                                                                                                                              bull Data transferring overhead

                                                                                                                                                              Best Practice test runs to profiling and set size to mitigate the overhead

                                                                                                                                                              Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                                                                                                                              bull too small repeated computation

                                                                                                                                                              bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                                                                                                                              bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                                                                                                                              bull Watch out for the 2-hour maximum limitation

                                                                                                                                                              BLAST task

                                                                                                                                                              Splitting task

                                                                                                                                                              BLAST task

                                                                                                                                                              BLAST task

                                                                                                                                                              BLAST task

                                                                                                                                                              hellip

                                                                                                                                                              Merging Task

                                                                                                                                                              Task size vs Performance

                                                                                                                                                              bull Benefit of the warm cache effect

                                                                                                                                                              bull 100 sequences per partition is the best choice

                                                                                                                                                              Instance size vs Performance

                                                                                                                                                              bull Super-linear speedup with larger size worker instances

                                                                                                                                                              bull Primarily due to the memory capability

                                                                                                                                                              Task SizeInstance Size vs Cost

                                                                                                                                                              bull Extra-large instance generated the best and the most economical throughput

                                                                                                                                                              bull Fully utilize the resource

                                                                                                                                                              Web

                                                                                                                                                              Portal

                                                                                                                                                              Web

                                                                                                                                                              Service

                                                                                                                                                              Job registration

                                                                                                                                                              Job Scheduler

                                                                                                                                                              Worker

                                                                                                                                                              Worker

                                                                                                                                                              Worker

                                                                                                                                                              Global

                                                                                                                                                              dispatch

                                                                                                                                                              queue

                                                                                                                                                              Web Role

                                                                                                                                                              Azure Table

                                                                                                                                                              Job Management Role

                                                                                                                                                              Azure Blob

                                                                                                                                                              Database

                                                                                                                                                              updating Role

                                                                                                                                                              hellip

                                                                                                                                                              Scaling Engine

                                                                                                                                                              Blast databases

                                                                                                                                                              temporary data etc)

                                                                                                                                                              Job Registry NCBI databases

                                                                                                                                                              BLAST task

                                                                                                                                                              Splitting task

                                                                                                                                                              BLAST task

                                                                                                                                                              BLAST task

                                                                                                                                                              BLAST task

                                                                                                                                                              hellip

                                                                                                                                                              Merging Task

                                                                                                                                                              ASPNET program hosted by a web role instance bull Submit jobs

                                                                                                                                                              bull Track jobrsquos status and logs

                                                                                                                                                              AuthenticationAuthorization based on Live ID

                                                                                                                                                              The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                                                                                                                              Web Portal

                                                                                                                                                              Web Service

                                                                                                                                                              Job registration

                                                                                                                                                              Job Scheduler

                                                                                                                                                              Job Portal

                                                                                                                                                              Scaling Engine

                                                                                                                                                              Job Registry

                                                                                                                                                              R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                                                                                                                              Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                                                                                                                              bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                                                                                                                              AzureBLAST significantly saved computing timehellip

                                                                                                                                                              Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                                                                                                                              ldquoAll against Allrdquo query bull The database is also the input query

                                                                                                                                                              bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                                                                                                                              bull Theoretically 100 billion sequence comparisons

                                                                                                                                                              Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                                                                                                                              bull Would require 3216731 minutes (61 years) on one desktop

                                                                                                                                                              One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                                                                                                                              bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                                                                                                              bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                                                                                                              bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                                                                                                              bull Each segment consists of smaller partitions

                                                                                                                                                              bull When load imbalances redistribute the load manually

                                                                                                                                                              5

                                                                                                                                                              0

                                                                                                                                                              62 6

                                                                                                                                                              2 6

                                                                                                                                                              2 6

                                                                                                                                                              2 6

                                                                                                                                                              2 5

                                                                                                                                                              0 62

                                                                                                                                                              bull Total size of the output result is ~230GB

                                                                                                                                                              bull The number of total hits is 1764579487

                                                                                                                                                              bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                                                                                                              bull But based our estimates real working instance time should be 6~8 day

                                                                                                                                                              bull Look into log data to analyze what took placehellip

                                                                                                                                                              5

                                                                                                                                                              0

                                                                                                                                                              62 6

                                                                                                                                                              2 6

                                                                                                                                                              2 6

                                                                                                                                                              2 6

                                                                                                                                                              2 5

                                                                                                                                                              0 62

                                                                                                                                                              A normal log record should be

                                                                                                                                                              Otherwise something is wrong (eg task failed to complete)

                                                                                                                                                              3312010 614 RD00155D3611B0 Executing the task 251523

                                                                                                                                                              3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                                                                                                              3312010 625 RD00155D3611B0 Executing the task 251553

                                                                                                                                                              3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                                                                                                              3312010 644 RD00155D3611B0 Executing the task 251600

                                                                                                                                                              3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                                                                                                              3312010 822 RD00155D3611B0 Executing the task 251774

                                                                                                                                                              3312010 950 RD00155D3611B0 Executing the task 251895

                                                                                                                                                              3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                                                                                                              North Europe Data Center totally 34256 tasks processed

                                                                                                                                                              All 62 compute nodes lost tasks

                                                                                                                                                              and then came back in a group

                                                                                                                                                              This is an Update domain

                                                                                                                                                              ~30 mins

                                                                                                                                                              ~ 6 nodes in one group

                                                                                                                                                              35 Nodes experience blob

                                                                                                                                                              writing failure at same

                                                                                                                                                              time

                                                                                                                                                              West Europe Datacenter 30976 tasks are completed and job was killed

                                                                                                                                                              A reasonable guess the

                                                                                                                                                              Fault Domain is working

                                                                                                                                                              MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                                                                                                              You never miss the water till the well has run dry Irish Proverb

                                                                                                                                                              ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                                                                                                              Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                                                                                                              λv = Latent heat of vaporization (Jg)

                                                                                                                                                              Rn = Net radiation (W m-2)

                                                                                                                                                              cp = Specific heat capacity of air (J kg-1 K-1)

                                                                                                                                                              ρa = dry air density (kg m-3)

                                                                                                                                                              δq = vapor pressure deficit (Pa)

                                                                                                                                                              ga = Conductivity of air (inverse of ra) (m s-1)

                                                                                                                                                              gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                                                                                                              γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                                                                                                              Estimating resistanceconductivity across a

                                                                                                                                                              catchment can be tricky

                                                                                                                                                              bull Lots of inputs big data reduction

                                                                                                                                                              bull Some of the inputs are not so simple

                                                                                                                                                              119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                                                                                                              (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                                                                                                              Penman-Monteith (1964)

                                                                                                                                                              Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                                                                                                              NASA MODIS

                                                                                                                                                              imagery source

                                                                                                                                                              archives

                                                                                                                                                              5 TB (600K files)

                                                                                                                                                              FLUXNET curated

                                                                                                                                                              sensor dataset

                                                                                                                                                              (30GB 960 files)

                                                                                                                                                              FLUXNET curated

                                                                                                                                                              field dataset

                                                                                                                                                              2 KB (1 file)

                                                                                                                                                              NCEPNCAR

                                                                                                                                                              ~100MB

                                                                                                                                                              (4K files)

                                                                                                                                                              Vegetative

                                                                                                                                                              clumping

                                                                                                                                                              ~5MB (1file)

                                                                                                                                                              Climate

                                                                                                                                                              classification

                                                                                                                                                              ~1MB (1file)

                                                                                                                                                              20 US year = 1 global year

                                                                                                                                                              Data collection (map) stage

                                                                                                                                                              bull Downloads requested input tiles

                                                                                                                                                              from NASA ftp sites

                                                                                                                                                              bull Includes geospatial lookup for

                                                                                                                                                              non-sinusoidal tiles that will

                                                                                                                                                              contribute to a reprojected

                                                                                                                                                              sinusoidal tile

                                                                                                                                                              Reprojection (map) stage

                                                                                                                                                              bull Converts source tile(s) to

                                                                                                                                                              intermediate result sinusoidal tiles

                                                                                                                                                              bull Simple nearest neighbor or spline

                                                                                                                                                              algorithms

                                                                                                                                                              Derivation reduction stage

                                                                                                                                                              bull First stage visible to scientist

                                                                                                                                                              bull Computes ET in our initial use

                                                                                                                                                              Analysis reduction stage

                                                                                                                                                              bull Optional second stage visible to

                                                                                                                                                              scientist

                                                                                                                                                              bull Enables production of science

                                                                                                                                                              analysis artifacts such as maps

                                                                                                                                                              tables virtual sensors

                                                                                                                                                              Reduction 1

                                                                                                                                                              Queue

                                                                                                                                                              Source

                                                                                                                                                              Metadata

                                                                                                                                                              AzureMODIS

                                                                                                                                                              Service Web Role Portal

                                                                                                                                                              Request

                                                                                                                                                              Queue

                                                                                                                                                              Scientific

                                                                                                                                                              Results

                                                                                                                                                              Download

                                                                                                                                                              Data Collection Stage

                                                                                                                                                              Source Imagery Download Sites

                                                                                                                                                              Reprojection

                                                                                                                                                              Queue

                                                                                                                                                              Reduction 2

                                                                                                                                                              Queue

                                                                                                                                                              Download

                                                                                                                                                              Queue

                                                                                                                                                              Scientists

                                                                                                                                                              Science results

                                                                                                                                                              Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                                              httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                                                                                              bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                                                                                              bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                                                                                              bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                                                                                              recoverable units of work

                                                                                                                                                              bull Execution status of all jobs and tasks persisted in Tables

                                                                                                                                                              ltPipelineStagegt

                                                                                                                                                              Request

                                                                                                                                                              hellip ltPipelineStagegtJobStatus

                                                                                                                                                              Persist ltPipelineStagegtJob Queue

                                                                                                                                                              MODISAzure Service

                                                                                                                                                              (Web Role)

                                                                                                                                                              Service Monitor

                                                                                                                                                              (Worker Role)

                                                                                                                                                              Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                                              hellip

                                                                                                                                                              Dispatch

                                                                                                                                                              ltPipelineStagegtTask Queue

                                                                                                                                                              All work actually done by a Worker Role

                                                                                                                                                              bull Sandboxes science or other executable

                                                                                                                                                              bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                                                                                              Service Monitor

                                                                                                                                                              (Worker Role)

                                                                                                                                                              Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                                              GenericWorker

                                                                                                                                                              (Worker Role)

                                                                                                                                                              hellip

                                                                                                                                                              hellip

                                                                                                                                                              Dispatch

                                                                                                                                                              ltPipelineStagegtTask Queue

                                                                                                                                                              hellip

                                                                                                                                                              ltInputgtData Storage

                                                                                                                                                              bull Dequeues tasks created by the Service Monitor

                                                                                                                                                              bull Retries failed tasks 3 times

                                                                                                                                                              bull Maintains all task status

                                                                                                                                                              Reprojection Request

                                                                                                                                                              hellip

                                                                                                                                                              Service Monitor

                                                                                                                                                              (Worker Role)

                                                                                                                                                              ReprojectionJobStatus Persist

                                                                                                                                                              Parse amp Persist ReprojectionTaskStatus

                                                                                                                                                              GenericWorker

                                                                                                                                                              (Worker Role)

                                                                                                                                                              hellip

                                                                                                                                                              Job Queue

                                                                                                                                                              hellip

                                                                                                                                                              Dispatch

                                                                                                                                                              Task Queue

                                                                                                                                                              Points to

                                                                                                                                                              hellip

                                                                                                                                                              ScanTimeList

                                                                                                                                                              SwathGranuleMeta

                                                                                                                                                              Reprojection Data

                                                                                                                                                              Storage

                                                                                                                                                              Each entity specifies a

                                                                                                                                                              single reprojection job

                                                                                                                                                              request

                                                                                                                                                              Each entity specifies a

                                                                                                                                                              single reprojection task (ie

                                                                                                                                                              a single tile)

                                                                                                                                                              Query this table to get

                                                                                                                                                              geo-metadata (eg

                                                                                                                                                              boundaries) for each swath

                                                                                                                                                              tile

                                                                                                                                                              Query this table to get the

                                                                                                                                                              list of satellite scan times

                                                                                                                                                              that cover a target tile

                                                                                                                                                              Swath Source

                                                                                                                                                              Data Storage

                                                                                                                                                              bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                                                                                              bull Storage costs driven by data scale and 6 month project duration

                                                                                                                                                              bull Small with respect to the people costs even at graduate student rates

                                                                                                                                                              Reduction 1 Queue

                                                                                                                                                              Source Metadata

                                                                                                                                                              Request Queue

                                                                                                                                                              Scientific Results Download

                                                                                                                                                              Data Collection Stage

                                                                                                                                                              Source Imagery Download Sites

                                                                                                                                                              Reprojection Queue

                                                                                                                                                              Reduction 2 Queue

                                                                                                                                                              Download

                                                                                                                                                              Queue

                                                                                                                                                              Scientists

                                                                                                                                                              Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                                              400-500 GB

                                                                                                                                                              60K files

                                                                                                                                                              10 MBsec

                                                                                                                                                              11 hours

                                                                                                                                                              lt10 workers

                                                                                                                                                              $50 upload

                                                                                                                                                              $450 storage

                                                                                                                                                              400 GB

                                                                                                                                                              45K files

                                                                                                                                                              3500 hours

                                                                                                                                                              20-100

                                                                                                                                                              workers

                                                                                                                                                              5-7 GB

                                                                                                                                                              55K files

                                                                                                                                                              1800 hours

                                                                                                                                                              20-100

                                                                                                                                                              workers

                                                                                                                                                              lt10 GB

                                                                                                                                                              ~1K files

                                                                                                                                                              1800 hours

                                                                                                                                                              20-100

                                                                                                                                                              workers

                                                                                                                                                              $420 cpu

                                                                                                                                                              $60 download

                                                                                                                                                              $216 cpu

                                                                                                                                                              $1 download

                                                                                                                                                              $6 storage

                                                                                                                                                              $216 cpu

                                                                                                                                                              $2 download

                                                                                                                                                              $9 storage

                                                                                                                                                              AzureMODIS

                                                                                                                                                              Service Web Role Portal

                                                                                                                                                              Total $1420

                                                                                                                                                              bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                                                                                              potential to be important to both large and small scale science problems

                                                                                                                                                              bull Equally import they can increase participation in research providing needed

                                                                                                                                                              resources to userscommunities without ready access

                                                                                                                                                              bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                                                                                              support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                                                                                              latency applications do not perform optimally on clouds today

                                                                                                                                                              bull Provide valuable fault tolerance and scalability abstractions

                                                                                                                                                              bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                                                                                              bull Clouds services to support research provide considerable leverage for both

                                                                                                                                                              individual researchers and entire communities of researchers

                                                                                                                                                              • Day 2 - Azure as a PaaS
                                                                                                                                                              • Day 2 - Applications

                                                                                                                                                                bull Parallel BLAST engine on Azure

                                                                                                                                                                bull Query-segmentation data-parallel pattern bull split the input sequences

                                                                                                                                                                bull query partitions in parallel

                                                                                                                                                                bull merge results together when done

                                                                                                                                                                bull Follows the general suggested application model bull Web Role + Queue + Worker

                                                                                                                                                                bull With three special considerations bull Batch job management

                                                                                                                                                                bull Task parallelism on an elastic Cloud

                                                                                                                                                                Wei Lu Jared Jackson and Roger Barga AzureBlast A Case Study of Developing Science Applications on the Cloud in Proceedings of the 1st Workshop

                                                                                                                                                                on Scientific Cloud Computing (Science Cloud 2010) Association for Computing Machinery Inc 21 June 2010

                                                                                                                                                                A simple SplitJoin pattern

                                                                                                                                                                Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                                                                                                                                bull 1248 for small middle large and extra large instance size

                                                                                                                                                                Task granularity bull Large partition load imbalance

                                                                                                                                                                bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                                                                                                                                bull Data transferring overhead

                                                                                                                                                                Best Practice test runs to profiling and set size to mitigate the overhead

                                                                                                                                                                Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                                                                                                                                bull too small repeated computation

                                                                                                                                                                bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                                                                                                                                bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                                                                                                                                bull Watch out for the 2-hour maximum limitation

                                                                                                                                                                BLAST task

                                                                                                                                                                Splitting task

                                                                                                                                                                BLAST task

                                                                                                                                                                BLAST task

                                                                                                                                                                BLAST task

                                                                                                                                                                hellip

                                                                                                                                                                Merging Task

                                                                                                                                                                Task size vs Performance

                                                                                                                                                                bull Benefit of the warm cache effect

                                                                                                                                                                bull 100 sequences per partition is the best choice

                                                                                                                                                                Instance size vs Performance

                                                                                                                                                                bull Super-linear speedup with larger size worker instances

                                                                                                                                                                bull Primarily due to the memory capability

                                                                                                                                                                Task SizeInstance Size vs Cost

                                                                                                                                                                bull Extra-large instance generated the best and the most economical throughput

                                                                                                                                                                bull Fully utilize the resource

                                                                                                                                                                Web

                                                                                                                                                                Portal

                                                                                                                                                                Web

                                                                                                                                                                Service

                                                                                                                                                                Job registration

                                                                                                                                                                Job Scheduler

                                                                                                                                                                Worker

                                                                                                                                                                Worker

                                                                                                                                                                Worker

                                                                                                                                                                Global

                                                                                                                                                                dispatch

                                                                                                                                                                queue

                                                                                                                                                                Web Role

                                                                                                                                                                Azure Table

                                                                                                                                                                Job Management Role

                                                                                                                                                                Azure Blob

                                                                                                                                                                Database

                                                                                                                                                                updating Role

                                                                                                                                                                hellip

                                                                                                                                                                Scaling Engine

                                                                                                                                                                Blast databases

                                                                                                                                                                temporary data etc)

                                                                                                                                                                Job Registry NCBI databases

                                                                                                                                                                BLAST task

                                                                                                                                                                Splitting task

                                                                                                                                                                BLAST task

                                                                                                                                                                BLAST task

                                                                                                                                                                BLAST task

                                                                                                                                                                hellip

                                                                                                                                                                Merging Task

                                                                                                                                                                ASPNET program hosted by a web role instance bull Submit jobs

                                                                                                                                                                bull Track jobrsquos status and logs

                                                                                                                                                                AuthenticationAuthorization based on Live ID

                                                                                                                                                                The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                                                                                                                                Web Portal

                                                                                                                                                                Web Service

                                                                                                                                                                Job registration

                                                                                                                                                                Job Scheduler

                                                                                                                                                                Job Portal

                                                                                                                                                                Scaling Engine

                                                                                                                                                                Job Registry

                                                                                                                                                                R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                                                                                                                                Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                                                                                                                                bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                                                                                                                                AzureBLAST significantly saved computing timehellip

                                                                                                                                                                Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                                                                                                                                ldquoAll against Allrdquo query bull The database is also the input query

                                                                                                                                                                bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                                                                                                                                bull Theoretically 100 billion sequence comparisons

                                                                                                                                                                Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                                                                                                                                bull Would require 3216731 minutes (61 years) on one desktop

                                                                                                                                                                One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                                                                                                                                bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                                                                                                                bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                                                                                                                bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                                                                                                                bull Each segment consists of smaller partitions

                                                                                                                                                                bull When load imbalances redistribute the load manually

                                                                                                                                                                5

                                                                                                                                                                0

                                                                                                                                                                62 6

                                                                                                                                                                2 6

                                                                                                                                                                2 6

                                                                                                                                                                2 6

                                                                                                                                                                2 5

                                                                                                                                                                0 62

                                                                                                                                                                bull Total size of the output result is ~230GB

                                                                                                                                                                bull The number of total hits is 1764579487

                                                                                                                                                                bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                                                                                                                bull But based our estimates real working instance time should be 6~8 day

                                                                                                                                                                bull Look into log data to analyze what took placehellip

                                                                                                                                                                5

                                                                                                                                                                0

                                                                                                                                                                62 6

                                                                                                                                                                2 6

                                                                                                                                                                2 6

                                                                                                                                                                2 6

                                                                                                                                                                2 5

                                                                                                                                                                0 62

                                                                                                                                                                A normal log record should be

                                                                                                                                                                Otherwise something is wrong (eg task failed to complete)

                                                                                                                                                                3312010 614 RD00155D3611B0 Executing the task 251523

                                                                                                                                                                3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                                                                                                                3312010 625 RD00155D3611B0 Executing the task 251553

                                                                                                                                                                3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                                                                                                                3312010 644 RD00155D3611B0 Executing the task 251600

                                                                                                                                                                3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                                                                                                                3312010 822 RD00155D3611B0 Executing the task 251774

                                                                                                                                                                3312010 950 RD00155D3611B0 Executing the task 251895

                                                                                                                                                                3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                                                                                                                North Europe Data Center totally 34256 tasks processed

                                                                                                                                                                All 62 compute nodes lost tasks

                                                                                                                                                                and then came back in a group

                                                                                                                                                                This is an Update domain

                                                                                                                                                                ~30 mins

                                                                                                                                                                ~ 6 nodes in one group

                                                                                                                                                                35 Nodes experience blob

                                                                                                                                                                writing failure at same

                                                                                                                                                                time

                                                                                                                                                                West Europe Datacenter 30976 tasks are completed and job was killed

                                                                                                                                                                A reasonable guess the

                                                                                                                                                                Fault Domain is working

                                                                                                                                                                MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                                                                                                                You never miss the water till the well has run dry Irish Proverb

                                                                                                                                                                ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                                                                                                                Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                                                                                                                λv = Latent heat of vaporization (Jg)

                                                                                                                                                                Rn = Net radiation (W m-2)

                                                                                                                                                                cp = Specific heat capacity of air (J kg-1 K-1)

                                                                                                                                                                ρa = dry air density (kg m-3)

                                                                                                                                                                δq = vapor pressure deficit (Pa)

                                                                                                                                                                ga = Conductivity of air (inverse of ra) (m s-1)

                                                                                                                                                                gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                                                                                                                γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                                                                                                                Estimating resistanceconductivity across a

                                                                                                                                                                catchment can be tricky

                                                                                                                                                                bull Lots of inputs big data reduction

                                                                                                                                                                bull Some of the inputs are not so simple

                                                                                                                                                                119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                                                                                                                (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                                                                                                                Penman-Monteith (1964)

                                                                                                                                                                Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                                                                                                                NASA MODIS

                                                                                                                                                                imagery source

                                                                                                                                                                archives

                                                                                                                                                                5 TB (600K files)

                                                                                                                                                                FLUXNET curated

                                                                                                                                                                sensor dataset

                                                                                                                                                                (30GB 960 files)

                                                                                                                                                                FLUXNET curated

                                                                                                                                                                field dataset

                                                                                                                                                                2 KB (1 file)

                                                                                                                                                                NCEPNCAR

                                                                                                                                                                ~100MB

                                                                                                                                                                (4K files)

                                                                                                                                                                Vegetative

                                                                                                                                                                clumping

                                                                                                                                                                ~5MB (1file)

                                                                                                                                                                Climate

                                                                                                                                                                classification

                                                                                                                                                                ~1MB (1file)

                                                                                                                                                                20 US year = 1 global year

                                                                                                                                                                Data collection (map) stage

                                                                                                                                                                bull Downloads requested input tiles

                                                                                                                                                                from NASA ftp sites

                                                                                                                                                                bull Includes geospatial lookup for

                                                                                                                                                                non-sinusoidal tiles that will

                                                                                                                                                                contribute to a reprojected

                                                                                                                                                                sinusoidal tile

                                                                                                                                                                Reprojection (map) stage

                                                                                                                                                                bull Converts source tile(s) to

                                                                                                                                                                intermediate result sinusoidal tiles

                                                                                                                                                                bull Simple nearest neighbor or spline

                                                                                                                                                                algorithms

                                                                                                                                                                Derivation reduction stage

                                                                                                                                                                bull First stage visible to scientist

                                                                                                                                                                bull Computes ET in our initial use

                                                                                                                                                                Analysis reduction stage

                                                                                                                                                                bull Optional second stage visible to

                                                                                                                                                                scientist

                                                                                                                                                                bull Enables production of science

                                                                                                                                                                analysis artifacts such as maps

                                                                                                                                                                tables virtual sensors

                                                                                                                                                                Reduction 1

                                                                                                                                                                Queue

                                                                                                                                                                Source

                                                                                                                                                                Metadata

                                                                                                                                                                AzureMODIS

                                                                                                                                                                Service Web Role Portal

                                                                                                                                                                Request

                                                                                                                                                                Queue

                                                                                                                                                                Scientific

                                                                                                                                                                Results

                                                                                                                                                                Download

                                                                                                                                                                Data Collection Stage

                                                                                                                                                                Source Imagery Download Sites

                                                                                                                                                                Reprojection

                                                                                                                                                                Queue

                                                                                                                                                                Reduction 2

                                                                                                                                                                Queue

                                                                                                                                                                Download

                                                                                                                                                                Queue

                                                                                                                                                                Scientists

                                                                                                                                                                Science results

                                                                                                                                                                Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                                                httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                                                                                                bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                                                                                                bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                                                                                                bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                                                                                                recoverable units of work

                                                                                                                                                                bull Execution status of all jobs and tasks persisted in Tables

                                                                                                                                                                ltPipelineStagegt

                                                                                                                                                                Request

                                                                                                                                                                hellip ltPipelineStagegtJobStatus

                                                                                                                                                                Persist ltPipelineStagegtJob Queue

                                                                                                                                                                MODISAzure Service

                                                                                                                                                                (Web Role)

                                                                                                                                                                Service Monitor

                                                                                                                                                                (Worker Role)

                                                                                                                                                                Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                                                hellip

                                                                                                                                                                Dispatch

                                                                                                                                                                ltPipelineStagegtTask Queue

                                                                                                                                                                All work actually done by a Worker Role

                                                                                                                                                                bull Sandboxes science or other executable

                                                                                                                                                                bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                                                                                                Service Monitor

                                                                                                                                                                (Worker Role)

                                                                                                                                                                Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                                                GenericWorker

                                                                                                                                                                (Worker Role)

                                                                                                                                                                hellip

                                                                                                                                                                hellip

                                                                                                                                                                Dispatch

                                                                                                                                                                ltPipelineStagegtTask Queue

                                                                                                                                                                hellip

                                                                                                                                                                ltInputgtData Storage

                                                                                                                                                                bull Dequeues tasks created by the Service Monitor

                                                                                                                                                                bull Retries failed tasks 3 times

                                                                                                                                                                bull Maintains all task status

                                                                                                                                                                Reprojection Request

                                                                                                                                                                hellip

                                                                                                                                                                Service Monitor

                                                                                                                                                                (Worker Role)

                                                                                                                                                                ReprojectionJobStatus Persist

                                                                                                                                                                Parse amp Persist ReprojectionTaskStatus

                                                                                                                                                                GenericWorker

                                                                                                                                                                (Worker Role)

                                                                                                                                                                hellip

                                                                                                                                                                Job Queue

                                                                                                                                                                hellip

                                                                                                                                                                Dispatch

                                                                                                                                                                Task Queue

                                                                                                                                                                Points to

                                                                                                                                                                hellip

                                                                                                                                                                ScanTimeList

                                                                                                                                                                SwathGranuleMeta

                                                                                                                                                                Reprojection Data

                                                                                                                                                                Storage

                                                                                                                                                                Each entity specifies a

                                                                                                                                                                single reprojection job

                                                                                                                                                                request

                                                                                                                                                                Each entity specifies a

                                                                                                                                                                single reprojection task (ie

                                                                                                                                                                a single tile)

                                                                                                                                                                Query this table to get

                                                                                                                                                                geo-metadata (eg

                                                                                                                                                                boundaries) for each swath

                                                                                                                                                                tile

                                                                                                                                                                Query this table to get the

                                                                                                                                                                list of satellite scan times

                                                                                                                                                                that cover a target tile

                                                                                                                                                                Swath Source

                                                                                                                                                                Data Storage

                                                                                                                                                                bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                                                                                                bull Storage costs driven by data scale and 6 month project duration

                                                                                                                                                                bull Small with respect to the people costs even at graduate student rates

                                                                                                                                                                Reduction 1 Queue

                                                                                                                                                                Source Metadata

                                                                                                                                                                Request Queue

                                                                                                                                                                Scientific Results Download

                                                                                                                                                                Data Collection Stage

                                                                                                                                                                Source Imagery Download Sites

                                                                                                                                                                Reprojection Queue

                                                                                                                                                                Reduction 2 Queue

                                                                                                                                                                Download

                                                                                                                                                                Queue

                                                                                                                                                                Scientists

                                                                                                                                                                Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                                                400-500 GB

                                                                                                                                                                60K files

                                                                                                                                                                10 MBsec

                                                                                                                                                                11 hours

                                                                                                                                                                lt10 workers

                                                                                                                                                                $50 upload

                                                                                                                                                                $450 storage

                                                                                                                                                                400 GB

                                                                                                                                                                45K files

                                                                                                                                                                3500 hours

                                                                                                                                                                20-100

                                                                                                                                                                workers

                                                                                                                                                                5-7 GB

                                                                                                                                                                55K files

                                                                                                                                                                1800 hours

                                                                                                                                                                20-100

                                                                                                                                                                workers

                                                                                                                                                                lt10 GB

                                                                                                                                                                ~1K files

                                                                                                                                                                1800 hours

                                                                                                                                                                20-100

                                                                                                                                                                workers

                                                                                                                                                                $420 cpu

                                                                                                                                                                $60 download

                                                                                                                                                                $216 cpu

                                                                                                                                                                $1 download

                                                                                                                                                                $6 storage

                                                                                                                                                                $216 cpu

                                                                                                                                                                $2 download

                                                                                                                                                                $9 storage

                                                                                                                                                                AzureMODIS

                                                                                                                                                                Service Web Role Portal

                                                                                                                                                                Total $1420

                                                                                                                                                                bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                                                                                                potential to be important to both large and small scale science problems

                                                                                                                                                                bull Equally import they can increase participation in research providing needed

                                                                                                                                                                resources to userscommunities without ready access

                                                                                                                                                                bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                                                                                                support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                                                                                                latency applications do not perform optimally on clouds today

                                                                                                                                                                bull Provide valuable fault tolerance and scalability abstractions

                                                                                                                                                                bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                                                                                                bull Clouds services to support research provide considerable leverage for both

                                                                                                                                                                individual researchers and entire communities of researchers

                                                                                                                                                                • Day 2 - Azure as a PaaS
                                                                                                                                                                • Day 2 - Applications

                                                                                                                                                                  A simple SplitJoin pattern

                                                                                                                                                                  Leverage multi-core of one instance bull argument ldquondashardquo of NCBI-BLAST

                                                                                                                                                                  bull 1248 for small middle large and extra large instance size

                                                                                                                                                                  Task granularity bull Large partition load imbalance

                                                                                                                                                                  bull Small partition unnecessary overheads bull NCBI-BLAST overhead

                                                                                                                                                                  bull Data transferring overhead

                                                                                                                                                                  Best Practice test runs to profiling and set size to mitigate the overhead

                                                                                                                                                                  Value of visibilityTimeout for each BLAST task bull Essentially an estimate of the task run time

                                                                                                                                                                  bull too small repeated computation

                                                                                                                                                                  bull too large unnecessary long period of waiting time in case of the instance failure Best Practice

                                                                                                                                                                  bull Estimate the value based on the number of pair-bases in the partition and test-runs

                                                                                                                                                                  bull Watch out for the 2-hour maximum limitation

                                                                                                                                                                  BLAST task

                                                                                                                                                                  Splitting task

                                                                                                                                                                  BLAST task

                                                                                                                                                                  BLAST task

                                                                                                                                                                  BLAST task

                                                                                                                                                                  hellip

                                                                                                                                                                  Merging Task

                                                                                                                                                                  Task size vs Performance

                                                                                                                                                                  bull Benefit of the warm cache effect

                                                                                                                                                                  bull 100 sequences per partition is the best choice

                                                                                                                                                                  Instance size vs Performance

                                                                                                                                                                  bull Super-linear speedup with larger size worker instances

                                                                                                                                                                  bull Primarily due to the memory capability

                                                                                                                                                                  Task SizeInstance Size vs Cost

                                                                                                                                                                  bull Extra-large instance generated the best and the most economical throughput

                                                                                                                                                                  bull Fully utilize the resource

                                                                                                                                                                  Web

                                                                                                                                                                  Portal

                                                                                                                                                                  Web

                                                                                                                                                                  Service

                                                                                                                                                                  Job registration

                                                                                                                                                                  Job Scheduler

                                                                                                                                                                  Worker

                                                                                                                                                                  Worker

                                                                                                                                                                  Worker

                                                                                                                                                                  Global

                                                                                                                                                                  dispatch

                                                                                                                                                                  queue

                                                                                                                                                                  Web Role

                                                                                                                                                                  Azure Table

                                                                                                                                                                  Job Management Role

                                                                                                                                                                  Azure Blob

                                                                                                                                                                  Database

                                                                                                                                                                  updating Role

                                                                                                                                                                  hellip

                                                                                                                                                                  Scaling Engine

                                                                                                                                                                  Blast databases

                                                                                                                                                                  temporary data etc)

                                                                                                                                                                  Job Registry NCBI databases

                                                                                                                                                                  BLAST task

                                                                                                                                                                  Splitting task

                                                                                                                                                                  BLAST task

                                                                                                                                                                  BLAST task

                                                                                                                                                                  BLAST task

                                                                                                                                                                  hellip

                                                                                                                                                                  Merging Task

                                                                                                                                                                  ASPNET program hosted by a web role instance bull Submit jobs

                                                                                                                                                                  bull Track jobrsquos status and logs

                                                                                                                                                                  AuthenticationAuthorization based on Live ID

                                                                                                                                                                  The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                                                                                                                                  Web Portal

                                                                                                                                                                  Web Service

                                                                                                                                                                  Job registration

                                                                                                                                                                  Job Scheduler

                                                                                                                                                                  Job Portal

                                                                                                                                                                  Scaling Engine

                                                                                                                                                                  Job Registry

                                                                                                                                                                  R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                                                                                                                                  Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                                                                                                                                  bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                                                                                                                                  AzureBLAST significantly saved computing timehellip

                                                                                                                                                                  Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                                                                                                                                  ldquoAll against Allrdquo query bull The database is also the input query

                                                                                                                                                                  bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                                                                                                                                  bull Theoretically 100 billion sequence comparisons

                                                                                                                                                                  Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                                                                                                                                  bull Would require 3216731 minutes (61 years) on one desktop

                                                                                                                                                                  One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                                                                                                                                  bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                                                                                                                  bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                                                                                                                  bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                                                                                                                  bull Each segment consists of smaller partitions

                                                                                                                                                                  bull When load imbalances redistribute the load manually

                                                                                                                                                                  5

                                                                                                                                                                  0

                                                                                                                                                                  62 6

                                                                                                                                                                  2 6

                                                                                                                                                                  2 6

                                                                                                                                                                  2 6

                                                                                                                                                                  2 5

                                                                                                                                                                  0 62

                                                                                                                                                                  bull Total size of the output result is ~230GB

                                                                                                                                                                  bull The number of total hits is 1764579487

                                                                                                                                                                  bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                                                                                                                  bull But based our estimates real working instance time should be 6~8 day

                                                                                                                                                                  bull Look into log data to analyze what took placehellip

                                                                                                                                                                  5

                                                                                                                                                                  0

                                                                                                                                                                  62 6

                                                                                                                                                                  2 6

                                                                                                                                                                  2 6

                                                                                                                                                                  2 6

                                                                                                                                                                  2 5

                                                                                                                                                                  0 62

                                                                                                                                                                  A normal log record should be

                                                                                                                                                                  Otherwise something is wrong (eg task failed to complete)

                                                                                                                                                                  3312010 614 RD00155D3611B0 Executing the task 251523

                                                                                                                                                                  3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                                                                                                                  3312010 625 RD00155D3611B0 Executing the task 251553

                                                                                                                                                                  3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                                                                                                                  3312010 644 RD00155D3611B0 Executing the task 251600

                                                                                                                                                                  3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                                                                                                                  3312010 822 RD00155D3611B0 Executing the task 251774

                                                                                                                                                                  3312010 950 RD00155D3611B0 Executing the task 251895

                                                                                                                                                                  3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                                                                                                                  North Europe Data Center totally 34256 tasks processed

                                                                                                                                                                  All 62 compute nodes lost tasks

                                                                                                                                                                  and then came back in a group

                                                                                                                                                                  This is an Update domain

                                                                                                                                                                  ~30 mins

                                                                                                                                                                  ~ 6 nodes in one group

                                                                                                                                                                  35 Nodes experience blob

                                                                                                                                                                  writing failure at same

                                                                                                                                                                  time

                                                                                                                                                                  West Europe Datacenter 30976 tasks are completed and job was killed

                                                                                                                                                                  A reasonable guess the

                                                                                                                                                                  Fault Domain is working

                                                                                                                                                                  MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                                                                                                                  You never miss the water till the well has run dry Irish Proverb

                                                                                                                                                                  ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                                                                                                                  Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                                                                                                                  λv = Latent heat of vaporization (Jg)

                                                                                                                                                                  Rn = Net radiation (W m-2)

                                                                                                                                                                  cp = Specific heat capacity of air (J kg-1 K-1)

                                                                                                                                                                  ρa = dry air density (kg m-3)

                                                                                                                                                                  δq = vapor pressure deficit (Pa)

                                                                                                                                                                  ga = Conductivity of air (inverse of ra) (m s-1)

                                                                                                                                                                  gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                                                                                                                  γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                                                                                                                  Estimating resistanceconductivity across a

                                                                                                                                                                  catchment can be tricky

                                                                                                                                                                  bull Lots of inputs big data reduction

                                                                                                                                                                  bull Some of the inputs are not so simple

                                                                                                                                                                  119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                                                                                                                  (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                                                                                                                  Penman-Monteith (1964)

                                                                                                                                                                  Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                                                                                                                  NASA MODIS

                                                                                                                                                                  imagery source

                                                                                                                                                                  archives

                                                                                                                                                                  5 TB (600K files)

                                                                                                                                                                  FLUXNET curated

                                                                                                                                                                  sensor dataset

                                                                                                                                                                  (30GB 960 files)

                                                                                                                                                                  FLUXNET curated

                                                                                                                                                                  field dataset

                                                                                                                                                                  2 KB (1 file)

                                                                                                                                                                  NCEPNCAR

                                                                                                                                                                  ~100MB

                                                                                                                                                                  (4K files)

                                                                                                                                                                  Vegetative

                                                                                                                                                                  clumping

                                                                                                                                                                  ~5MB (1file)

                                                                                                                                                                  Climate

                                                                                                                                                                  classification

                                                                                                                                                                  ~1MB (1file)

                                                                                                                                                                  20 US year = 1 global year

                                                                                                                                                                  Data collection (map) stage

                                                                                                                                                                  bull Downloads requested input tiles

                                                                                                                                                                  from NASA ftp sites

                                                                                                                                                                  bull Includes geospatial lookup for

                                                                                                                                                                  non-sinusoidal tiles that will

                                                                                                                                                                  contribute to a reprojected

                                                                                                                                                                  sinusoidal tile

                                                                                                                                                                  Reprojection (map) stage

                                                                                                                                                                  bull Converts source tile(s) to

                                                                                                                                                                  intermediate result sinusoidal tiles

                                                                                                                                                                  bull Simple nearest neighbor or spline

                                                                                                                                                                  algorithms

                                                                                                                                                                  Derivation reduction stage

                                                                                                                                                                  bull First stage visible to scientist

                                                                                                                                                                  bull Computes ET in our initial use

                                                                                                                                                                  Analysis reduction stage

                                                                                                                                                                  bull Optional second stage visible to

                                                                                                                                                                  scientist

                                                                                                                                                                  bull Enables production of science

                                                                                                                                                                  analysis artifacts such as maps

                                                                                                                                                                  tables virtual sensors

                                                                                                                                                                  Reduction 1

                                                                                                                                                                  Queue

                                                                                                                                                                  Source

                                                                                                                                                                  Metadata

                                                                                                                                                                  AzureMODIS

                                                                                                                                                                  Service Web Role Portal

                                                                                                                                                                  Request

                                                                                                                                                                  Queue

                                                                                                                                                                  Scientific

                                                                                                                                                                  Results

                                                                                                                                                                  Download

                                                                                                                                                                  Data Collection Stage

                                                                                                                                                                  Source Imagery Download Sites

                                                                                                                                                                  Reprojection

                                                                                                                                                                  Queue

                                                                                                                                                                  Reduction 2

                                                                                                                                                                  Queue

                                                                                                                                                                  Download

                                                                                                                                                                  Queue

                                                                                                                                                                  Scientists

                                                                                                                                                                  Science results

                                                                                                                                                                  Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                                                  httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                                                                                                  bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                                                                                                  bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                                                                                                  bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                                                                                                  recoverable units of work

                                                                                                                                                                  bull Execution status of all jobs and tasks persisted in Tables

                                                                                                                                                                  ltPipelineStagegt

                                                                                                                                                                  Request

                                                                                                                                                                  hellip ltPipelineStagegtJobStatus

                                                                                                                                                                  Persist ltPipelineStagegtJob Queue

                                                                                                                                                                  MODISAzure Service

                                                                                                                                                                  (Web Role)

                                                                                                                                                                  Service Monitor

                                                                                                                                                                  (Worker Role)

                                                                                                                                                                  Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                                                  hellip

                                                                                                                                                                  Dispatch

                                                                                                                                                                  ltPipelineStagegtTask Queue

                                                                                                                                                                  All work actually done by a Worker Role

                                                                                                                                                                  bull Sandboxes science or other executable

                                                                                                                                                                  bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                                                                                                  Service Monitor

                                                                                                                                                                  (Worker Role)

                                                                                                                                                                  Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                                                  GenericWorker

                                                                                                                                                                  (Worker Role)

                                                                                                                                                                  hellip

                                                                                                                                                                  hellip

                                                                                                                                                                  Dispatch

                                                                                                                                                                  ltPipelineStagegtTask Queue

                                                                                                                                                                  hellip

                                                                                                                                                                  ltInputgtData Storage

                                                                                                                                                                  bull Dequeues tasks created by the Service Monitor

                                                                                                                                                                  bull Retries failed tasks 3 times

                                                                                                                                                                  bull Maintains all task status

                                                                                                                                                                  Reprojection Request

                                                                                                                                                                  hellip

                                                                                                                                                                  Service Monitor

                                                                                                                                                                  (Worker Role)

                                                                                                                                                                  ReprojectionJobStatus Persist

                                                                                                                                                                  Parse amp Persist ReprojectionTaskStatus

                                                                                                                                                                  GenericWorker

                                                                                                                                                                  (Worker Role)

                                                                                                                                                                  hellip

                                                                                                                                                                  Job Queue

                                                                                                                                                                  hellip

                                                                                                                                                                  Dispatch

                                                                                                                                                                  Task Queue

                                                                                                                                                                  Points to

                                                                                                                                                                  hellip

                                                                                                                                                                  ScanTimeList

                                                                                                                                                                  SwathGranuleMeta

                                                                                                                                                                  Reprojection Data

                                                                                                                                                                  Storage

                                                                                                                                                                  Each entity specifies a

                                                                                                                                                                  single reprojection job

                                                                                                                                                                  request

                                                                                                                                                                  Each entity specifies a

                                                                                                                                                                  single reprojection task (ie

                                                                                                                                                                  a single tile)

                                                                                                                                                                  Query this table to get

                                                                                                                                                                  geo-metadata (eg

                                                                                                                                                                  boundaries) for each swath

                                                                                                                                                                  tile

                                                                                                                                                                  Query this table to get the

                                                                                                                                                                  list of satellite scan times

                                                                                                                                                                  that cover a target tile

                                                                                                                                                                  Swath Source

                                                                                                                                                                  Data Storage

                                                                                                                                                                  bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                                                                                                  bull Storage costs driven by data scale and 6 month project duration

                                                                                                                                                                  bull Small with respect to the people costs even at graduate student rates

                                                                                                                                                                  Reduction 1 Queue

                                                                                                                                                                  Source Metadata

                                                                                                                                                                  Request Queue

                                                                                                                                                                  Scientific Results Download

                                                                                                                                                                  Data Collection Stage

                                                                                                                                                                  Source Imagery Download Sites

                                                                                                                                                                  Reprojection Queue

                                                                                                                                                                  Reduction 2 Queue

                                                                                                                                                                  Download

                                                                                                                                                                  Queue

                                                                                                                                                                  Scientists

                                                                                                                                                                  Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                                                  400-500 GB

                                                                                                                                                                  60K files

                                                                                                                                                                  10 MBsec

                                                                                                                                                                  11 hours

                                                                                                                                                                  lt10 workers

                                                                                                                                                                  $50 upload

                                                                                                                                                                  $450 storage

                                                                                                                                                                  400 GB

                                                                                                                                                                  45K files

                                                                                                                                                                  3500 hours

                                                                                                                                                                  20-100

                                                                                                                                                                  workers

                                                                                                                                                                  5-7 GB

                                                                                                                                                                  55K files

                                                                                                                                                                  1800 hours

                                                                                                                                                                  20-100

                                                                                                                                                                  workers

                                                                                                                                                                  lt10 GB

                                                                                                                                                                  ~1K files

                                                                                                                                                                  1800 hours

                                                                                                                                                                  20-100

                                                                                                                                                                  workers

                                                                                                                                                                  $420 cpu

                                                                                                                                                                  $60 download

                                                                                                                                                                  $216 cpu

                                                                                                                                                                  $1 download

                                                                                                                                                                  $6 storage

                                                                                                                                                                  $216 cpu

                                                                                                                                                                  $2 download

                                                                                                                                                                  $9 storage

                                                                                                                                                                  AzureMODIS

                                                                                                                                                                  Service Web Role Portal

                                                                                                                                                                  Total $1420

                                                                                                                                                                  bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                                                                                                  potential to be important to both large and small scale science problems

                                                                                                                                                                  bull Equally import they can increase participation in research providing needed

                                                                                                                                                                  resources to userscommunities without ready access

                                                                                                                                                                  bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                                                                                                  support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                                                                                                  latency applications do not perform optimally on clouds today

                                                                                                                                                                  bull Provide valuable fault tolerance and scalability abstractions

                                                                                                                                                                  bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                                                                                                  bull Clouds services to support research provide considerable leverage for both

                                                                                                                                                                  individual researchers and entire communities of researchers

                                                                                                                                                                  • Day 2 - Azure as a PaaS
                                                                                                                                                                  • Day 2 - Applications

                                                                                                                                                                    Task size vs Performance

                                                                                                                                                                    bull Benefit of the warm cache effect

                                                                                                                                                                    bull 100 sequences per partition is the best choice

                                                                                                                                                                    Instance size vs Performance

                                                                                                                                                                    bull Super-linear speedup with larger size worker instances

                                                                                                                                                                    bull Primarily due to the memory capability

                                                                                                                                                                    Task SizeInstance Size vs Cost

                                                                                                                                                                    bull Extra-large instance generated the best and the most economical throughput

                                                                                                                                                                    bull Fully utilize the resource

                                                                                                                                                                    Web

                                                                                                                                                                    Portal

                                                                                                                                                                    Web

                                                                                                                                                                    Service

                                                                                                                                                                    Job registration

                                                                                                                                                                    Job Scheduler

                                                                                                                                                                    Worker

                                                                                                                                                                    Worker

                                                                                                                                                                    Worker

                                                                                                                                                                    Global

                                                                                                                                                                    dispatch

                                                                                                                                                                    queue

                                                                                                                                                                    Web Role

                                                                                                                                                                    Azure Table

                                                                                                                                                                    Job Management Role

                                                                                                                                                                    Azure Blob

                                                                                                                                                                    Database

                                                                                                                                                                    updating Role

                                                                                                                                                                    hellip

                                                                                                                                                                    Scaling Engine

                                                                                                                                                                    Blast databases

                                                                                                                                                                    temporary data etc)

                                                                                                                                                                    Job Registry NCBI databases

                                                                                                                                                                    BLAST task

                                                                                                                                                                    Splitting task

                                                                                                                                                                    BLAST task

                                                                                                                                                                    BLAST task

                                                                                                                                                                    BLAST task

                                                                                                                                                                    hellip

                                                                                                                                                                    Merging Task

                                                                                                                                                                    ASPNET program hosted by a web role instance bull Submit jobs

                                                                                                                                                                    bull Track jobrsquos status and logs

                                                                                                                                                                    AuthenticationAuthorization based on Live ID

                                                                                                                                                                    The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                                                                                                                                    Web Portal

                                                                                                                                                                    Web Service

                                                                                                                                                                    Job registration

                                                                                                                                                                    Job Scheduler

                                                                                                                                                                    Job Portal

                                                                                                                                                                    Scaling Engine

                                                                                                                                                                    Job Registry

                                                                                                                                                                    R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                                                                                                                                    Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                                                                                                                                    bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                                                                                                                                    AzureBLAST significantly saved computing timehellip

                                                                                                                                                                    Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                                                                                                                                    ldquoAll against Allrdquo query bull The database is also the input query

                                                                                                                                                                    bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                                                                                                                                    bull Theoretically 100 billion sequence comparisons

                                                                                                                                                                    Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                                                                                                                                    bull Would require 3216731 minutes (61 years) on one desktop

                                                                                                                                                                    One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                                                                                                                                    bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                                                                                                                    bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                                                                                                                    bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                                                                                                                    bull Each segment consists of smaller partitions

                                                                                                                                                                    bull When load imbalances redistribute the load manually

                                                                                                                                                                    5

                                                                                                                                                                    0

                                                                                                                                                                    62 6

                                                                                                                                                                    2 6

                                                                                                                                                                    2 6

                                                                                                                                                                    2 6

                                                                                                                                                                    2 5

                                                                                                                                                                    0 62

                                                                                                                                                                    bull Total size of the output result is ~230GB

                                                                                                                                                                    bull The number of total hits is 1764579487

                                                                                                                                                                    bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                                                                                                                    bull But based our estimates real working instance time should be 6~8 day

                                                                                                                                                                    bull Look into log data to analyze what took placehellip

                                                                                                                                                                    5

                                                                                                                                                                    0

                                                                                                                                                                    62 6

                                                                                                                                                                    2 6

                                                                                                                                                                    2 6

                                                                                                                                                                    2 6

                                                                                                                                                                    2 5

                                                                                                                                                                    0 62

                                                                                                                                                                    A normal log record should be

                                                                                                                                                                    Otherwise something is wrong (eg task failed to complete)

                                                                                                                                                                    3312010 614 RD00155D3611B0 Executing the task 251523

                                                                                                                                                                    3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                                                                                                                    3312010 625 RD00155D3611B0 Executing the task 251553

                                                                                                                                                                    3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                                                                                                                    3312010 644 RD00155D3611B0 Executing the task 251600

                                                                                                                                                                    3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                                                                                                                    3312010 822 RD00155D3611B0 Executing the task 251774

                                                                                                                                                                    3312010 950 RD00155D3611B0 Executing the task 251895

                                                                                                                                                                    3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                                                                                                                    North Europe Data Center totally 34256 tasks processed

                                                                                                                                                                    All 62 compute nodes lost tasks

                                                                                                                                                                    and then came back in a group

                                                                                                                                                                    This is an Update domain

                                                                                                                                                                    ~30 mins

                                                                                                                                                                    ~ 6 nodes in one group

                                                                                                                                                                    35 Nodes experience blob

                                                                                                                                                                    writing failure at same

                                                                                                                                                                    time

                                                                                                                                                                    West Europe Datacenter 30976 tasks are completed and job was killed

                                                                                                                                                                    A reasonable guess the

                                                                                                                                                                    Fault Domain is working

                                                                                                                                                                    MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                                                                                                                    You never miss the water till the well has run dry Irish Proverb

                                                                                                                                                                    ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                                                                                                                    Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                                                                                                                    λv = Latent heat of vaporization (Jg)

                                                                                                                                                                    Rn = Net radiation (W m-2)

                                                                                                                                                                    cp = Specific heat capacity of air (J kg-1 K-1)

                                                                                                                                                                    ρa = dry air density (kg m-3)

                                                                                                                                                                    δq = vapor pressure deficit (Pa)

                                                                                                                                                                    ga = Conductivity of air (inverse of ra) (m s-1)

                                                                                                                                                                    gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                                                                                                                    γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                                                                                                                    Estimating resistanceconductivity across a

                                                                                                                                                                    catchment can be tricky

                                                                                                                                                                    bull Lots of inputs big data reduction

                                                                                                                                                                    bull Some of the inputs are not so simple

                                                                                                                                                                    119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                                                                                                                    (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                                                                                                                    Penman-Monteith (1964)

                                                                                                                                                                    Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                                                                                                                    NASA MODIS

                                                                                                                                                                    imagery source

                                                                                                                                                                    archives

                                                                                                                                                                    5 TB (600K files)

                                                                                                                                                                    FLUXNET curated

                                                                                                                                                                    sensor dataset

                                                                                                                                                                    (30GB 960 files)

                                                                                                                                                                    FLUXNET curated

                                                                                                                                                                    field dataset

                                                                                                                                                                    2 KB (1 file)

                                                                                                                                                                    NCEPNCAR

                                                                                                                                                                    ~100MB

                                                                                                                                                                    (4K files)

                                                                                                                                                                    Vegetative

                                                                                                                                                                    clumping

                                                                                                                                                                    ~5MB (1file)

                                                                                                                                                                    Climate

                                                                                                                                                                    classification

                                                                                                                                                                    ~1MB (1file)

                                                                                                                                                                    20 US year = 1 global year

                                                                                                                                                                    Data collection (map) stage

                                                                                                                                                                    bull Downloads requested input tiles

                                                                                                                                                                    from NASA ftp sites

                                                                                                                                                                    bull Includes geospatial lookup for

                                                                                                                                                                    non-sinusoidal tiles that will

                                                                                                                                                                    contribute to a reprojected

                                                                                                                                                                    sinusoidal tile

                                                                                                                                                                    Reprojection (map) stage

                                                                                                                                                                    bull Converts source tile(s) to

                                                                                                                                                                    intermediate result sinusoidal tiles

                                                                                                                                                                    bull Simple nearest neighbor or spline

                                                                                                                                                                    algorithms

                                                                                                                                                                    Derivation reduction stage

                                                                                                                                                                    bull First stage visible to scientist

                                                                                                                                                                    bull Computes ET in our initial use

                                                                                                                                                                    Analysis reduction stage

                                                                                                                                                                    bull Optional second stage visible to

                                                                                                                                                                    scientist

                                                                                                                                                                    bull Enables production of science

                                                                                                                                                                    analysis artifacts such as maps

                                                                                                                                                                    tables virtual sensors

                                                                                                                                                                    Reduction 1

                                                                                                                                                                    Queue

                                                                                                                                                                    Source

                                                                                                                                                                    Metadata

                                                                                                                                                                    AzureMODIS

                                                                                                                                                                    Service Web Role Portal

                                                                                                                                                                    Request

                                                                                                                                                                    Queue

                                                                                                                                                                    Scientific

                                                                                                                                                                    Results

                                                                                                                                                                    Download

                                                                                                                                                                    Data Collection Stage

                                                                                                                                                                    Source Imagery Download Sites

                                                                                                                                                                    Reprojection

                                                                                                                                                                    Queue

                                                                                                                                                                    Reduction 2

                                                                                                                                                                    Queue

                                                                                                                                                                    Download

                                                                                                                                                                    Queue

                                                                                                                                                                    Scientists

                                                                                                                                                                    Science results

                                                                                                                                                                    Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                                                    httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                                                                                                    bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                                                                                                    bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                                                                                                    bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                                                                                                    recoverable units of work

                                                                                                                                                                    bull Execution status of all jobs and tasks persisted in Tables

                                                                                                                                                                    ltPipelineStagegt

                                                                                                                                                                    Request

                                                                                                                                                                    hellip ltPipelineStagegtJobStatus

                                                                                                                                                                    Persist ltPipelineStagegtJob Queue

                                                                                                                                                                    MODISAzure Service

                                                                                                                                                                    (Web Role)

                                                                                                                                                                    Service Monitor

                                                                                                                                                                    (Worker Role)

                                                                                                                                                                    Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                                                    hellip

                                                                                                                                                                    Dispatch

                                                                                                                                                                    ltPipelineStagegtTask Queue

                                                                                                                                                                    All work actually done by a Worker Role

                                                                                                                                                                    bull Sandboxes science or other executable

                                                                                                                                                                    bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                                                                                                    Service Monitor

                                                                                                                                                                    (Worker Role)

                                                                                                                                                                    Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                                                    GenericWorker

                                                                                                                                                                    (Worker Role)

                                                                                                                                                                    hellip

                                                                                                                                                                    hellip

                                                                                                                                                                    Dispatch

                                                                                                                                                                    ltPipelineStagegtTask Queue

                                                                                                                                                                    hellip

                                                                                                                                                                    ltInputgtData Storage

                                                                                                                                                                    bull Dequeues tasks created by the Service Monitor

                                                                                                                                                                    bull Retries failed tasks 3 times

                                                                                                                                                                    bull Maintains all task status

                                                                                                                                                                    Reprojection Request

                                                                                                                                                                    hellip

                                                                                                                                                                    Service Monitor

                                                                                                                                                                    (Worker Role)

                                                                                                                                                                    ReprojectionJobStatus Persist

                                                                                                                                                                    Parse amp Persist ReprojectionTaskStatus

                                                                                                                                                                    GenericWorker

                                                                                                                                                                    (Worker Role)

                                                                                                                                                                    hellip

                                                                                                                                                                    Job Queue

                                                                                                                                                                    hellip

                                                                                                                                                                    Dispatch

                                                                                                                                                                    Task Queue

                                                                                                                                                                    Points to

                                                                                                                                                                    hellip

                                                                                                                                                                    ScanTimeList

                                                                                                                                                                    SwathGranuleMeta

                                                                                                                                                                    Reprojection Data

                                                                                                                                                                    Storage

                                                                                                                                                                    Each entity specifies a

                                                                                                                                                                    single reprojection job

                                                                                                                                                                    request

                                                                                                                                                                    Each entity specifies a

                                                                                                                                                                    single reprojection task (ie

                                                                                                                                                                    a single tile)

                                                                                                                                                                    Query this table to get

                                                                                                                                                                    geo-metadata (eg

                                                                                                                                                                    boundaries) for each swath

                                                                                                                                                                    tile

                                                                                                                                                                    Query this table to get the

                                                                                                                                                                    list of satellite scan times

                                                                                                                                                                    that cover a target tile

                                                                                                                                                                    Swath Source

                                                                                                                                                                    Data Storage

                                                                                                                                                                    bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                                                                                                    bull Storage costs driven by data scale and 6 month project duration

                                                                                                                                                                    bull Small with respect to the people costs even at graduate student rates

                                                                                                                                                                    Reduction 1 Queue

                                                                                                                                                                    Source Metadata

                                                                                                                                                                    Request Queue

                                                                                                                                                                    Scientific Results Download

                                                                                                                                                                    Data Collection Stage

                                                                                                                                                                    Source Imagery Download Sites

                                                                                                                                                                    Reprojection Queue

                                                                                                                                                                    Reduction 2 Queue

                                                                                                                                                                    Download

                                                                                                                                                                    Queue

                                                                                                                                                                    Scientists

                                                                                                                                                                    Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                                                    400-500 GB

                                                                                                                                                                    60K files

                                                                                                                                                                    10 MBsec

                                                                                                                                                                    11 hours

                                                                                                                                                                    lt10 workers

                                                                                                                                                                    $50 upload

                                                                                                                                                                    $450 storage

                                                                                                                                                                    400 GB

                                                                                                                                                                    45K files

                                                                                                                                                                    3500 hours

                                                                                                                                                                    20-100

                                                                                                                                                                    workers

                                                                                                                                                                    5-7 GB

                                                                                                                                                                    55K files

                                                                                                                                                                    1800 hours

                                                                                                                                                                    20-100

                                                                                                                                                                    workers

                                                                                                                                                                    lt10 GB

                                                                                                                                                                    ~1K files

                                                                                                                                                                    1800 hours

                                                                                                                                                                    20-100

                                                                                                                                                                    workers

                                                                                                                                                                    $420 cpu

                                                                                                                                                                    $60 download

                                                                                                                                                                    $216 cpu

                                                                                                                                                                    $1 download

                                                                                                                                                                    $6 storage

                                                                                                                                                                    $216 cpu

                                                                                                                                                                    $2 download

                                                                                                                                                                    $9 storage

                                                                                                                                                                    AzureMODIS

                                                                                                                                                                    Service Web Role Portal

                                                                                                                                                                    Total $1420

                                                                                                                                                                    bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                                                                                                    potential to be important to both large and small scale science problems

                                                                                                                                                                    bull Equally import they can increase participation in research providing needed

                                                                                                                                                                    resources to userscommunities without ready access

                                                                                                                                                                    bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                                                                                                    support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                                                                                                    latency applications do not perform optimally on clouds today

                                                                                                                                                                    bull Provide valuable fault tolerance and scalability abstractions

                                                                                                                                                                    bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                                                                                                    bull Clouds services to support research provide considerable leverage for both

                                                                                                                                                                    individual researchers and entire communities of researchers

                                                                                                                                                                    • Day 2 - Azure as a PaaS
                                                                                                                                                                    • Day 2 - Applications

                                                                                                                                                                      Web

                                                                                                                                                                      Portal

                                                                                                                                                                      Web

                                                                                                                                                                      Service

                                                                                                                                                                      Job registration

                                                                                                                                                                      Job Scheduler

                                                                                                                                                                      Worker

                                                                                                                                                                      Worker

                                                                                                                                                                      Worker

                                                                                                                                                                      Global

                                                                                                                                                                      dispatch

                                                                                                                                                                      queue

                                                                                                                                                                      Web Role

                                                                                                                                                                      Azure Table

                                                                                                                                                                      Job Management Role

                                                                                                                                                                      Azure Blob

                                                                                                                                                                      Database

                                                                                                                                                                      updating Role

                                                                                                                                                                      hellip

                                                                                                                                                                      Scaling Engine

                                                                                                                                                                      Blast databases

                                                                                                                                                                      temporary data etc)

                                                                                                                                                                      Job Registry NCBI databases

                                                                                                                                                                      BLAST task

                                                                                                                                                                      Splitting task

                                                                                                                                                                      BLAST task

                                                                                                                                                                      BLAST task

                                                                                                                                                                      BLAST task

                                                                                                                                                                      hellip

                                                                                                                                                                      Merging Task

                                                                                                                                                                      ASPNET program hosted by a web role instance bull Submit jobs

                                                                                                                                                                      bull Track jobrsquos status and logs

                                                                                                                                                                      AuthenticationAuthorization based on Live ID

                                                                                                                                                                      The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                                                                                                                                      Web Portal

                                                                                                                                                                      Web Service

                                                                                                                                                                      Job registration

                                                                                                                                                                      Job Scheduler

                                                                                                                                                                      Job Portal

                                                                                                                                                                      Scaling Engine

                                                                                                                                                                      Job Registry

                                                                                                                                                                      R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                                                                                                                                      Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                                                                                                                                      bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                                                                                                                                      AzureBLAST significantly saved computing timehellip

                                                                                                                                                                      Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                                                                                                                                      ldquoAll against Allrdquo query bull The database is also the input query

                                                                                                                                                                      bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                                                                                                                                      bull Theoretically 100 billion sequence comparisons

                                                                                                                                                                      Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                                                                                                                                      bull Would require 3216731 minutes (61 years) on one desktop

                                                                                                                                                                      One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                                                                                                                                      bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                                                                                                                      bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                                                                                                                      bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                                                                                                                      bull Each segment consists of smaller partitions

                                                                                                                                                                      bull When load imbalances redistribute the load manually

                                                                                                                                                                      5

                                                                                                                                                                      0

                                                                                                                                                                      62 6

                                                                                                                                                                      2 6

                                                                                                                                                                      2 6

                                                                                                                                                                      2 6

                                                                                                                                                                      2 5

                                                                                                                                                                      0 62

                                                                                                                                                                      bull Total size of the output result is ~230GB

                                                                                                                                                                      bull The number of total hits is 1764579487

                                                                                                                                                                      bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                                                                                                                      bull But based our estimates real working instance time should be 6~8 day

                                                                                                                                                                      bull Look into log data to analyze what took placehellip

                                                                                                                                                                      5

                                                                                                                                                                      0

                                                                                                                                                                      62 6

                                                                                                                                                                      2 6

                                                                                                                                                                      2 6

                                                                                                                                                                      2 6

                                                                                                                                                                      2 5

                                                                                                                                                                      0 62

                                                                                                                                                                      A normal log record should be

                                                                                                                                                                      Otherwise something is wrong (eg task failed to complete)

                                                                                                                                                                      3312010 614 RD00155D3611B0 Executing the task 251523

                                                                                                                                                                      3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                                                                                                                      3312010 625 RD00155D3611B0 Executing the task 251553

                                                                                                                                                                      3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                                                                                                                      3312010 644 RD00155D3611B0 Executing the task 251600

                                                                                                                                                                      3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                                                                                                                      3312010 822 RD00155D3611B0 Executing the task 251774

                                                                                                                                                                      3312010 950 RD00155D3611B0 Executing the task 251895

                                                                                                                                                                      3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                                                                                                                      North Europe Data Center totally 34256 tasks processed

                                                                                                                                                                      All 62 compute nodes lost tasks

                                                                                                                                                                      and then came back in a group

                                                                                                                                                                      This is an Update domain

                                                                                                                                                                      ~30 mins

                                                                                                                                                                      ~ 6 nodes in one group

                                                                                                                                                                      35 Nodes experience blob

                                                                                                                                                                      writing failure at same

                                                                                                                                                                      time

                                                                                                                                                                      West Europe Datacenter 30976 tasks are completed and job was killed

                                                                                                                                                                      A reasonable guess the

                                                                                                                                                                      Fault Domain is working

                                                                                                                                                                      MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                                                                                                                      You never miss the water till the well has run dry Irish Proverb

                                                                                                                                                                      ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                                                                                                                      Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                                                                                                                      λv = Latent heat of vaporization (Jg)

                                                                                                                                                                      Rn = Net radiation (W m-2)

                                                                                                                                                                      cp = Specific heat capacity of air (J kg-1 K-1)

                                                                                                                                                                      ρa = dry air density (kg m-3)

                                                                                                                                                                      δq = vapor pressure deficit (Pa)

                                                                                                                                                                      ga = Conductivity of air (inverse of ra) (m s-1)

                                                                                                                                                                      gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                                                                                                                      γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                                                                                                                      Estimating resistanceconductivity across a

                                                                                                                                                                      catchment can be tricky

                                                                                                                                                                      bull Lots of inputs big data reduction

                                                                                                                                                                      bull Some of the inputs are not so simple

                                                                                                                                                                      119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                                                                                                                      (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                                                                                                                      Penman-Monteith (1964)

                                                                                                                                                                      Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                                                                                                                      NASA MODIS

                                                                                                                                                                      imagery source

                                                                                                                                                                      archives

                                                                                                                                                                      5 TB (600K files)

                                                                                                                                                                      FLUXNET curated

                                                                                                                                                                      sensor dataset

                                                                                                                                                                      (30GB 960 files)

                                                                                                                                                                      FLUXNET curated

                                                                                                                                                                      field dataset

                                                                                                                                                                      2 KB (1 file)

                                                                                                                                                                      NCEPNCAR

                                                                                                                                                                      ~100MB

                                                                                                                                                                      (4K files)

                                                                                                                                                                      Vegetative

                                                                                                                                                                      clumping

                                                                                                                                                                      ~5MB (1file)

                                                                                                                                                                      Climate

                                                                                                                                                                      classification

                                                                                                                                                                      ~1MB (1file)

                                                                                                                                                                      20 US year = 1 global year

                                                                                                                                                                      Data collection (map) stage

                                                                                                                                                                      bull Downloads requested input tiles

                                                                                                                                                                      from NASA ftp sites

                                                                                                                                                                      bull Includes geospatial lookup for

                                                                                                                                                                      non-sinusoidal tiles that will

                                                                                                                                                                      contribute to a reprojected

                                                                                                                                                                      sinusoidal tile

                                                                                                                                                                      Reprojection (map) stage

                                                                                                                                                                      bull Converts source tile(s) to

                                                                                                                                                                      intermediate result sinusoidal tiles

                                                                                                                                                                      bull Simple nearest neighbor or spline

                                                                                                                                                                      algorithms

                                                                                                                                                                      Derivation reduction stage

                                                                                                                                                                      bull First stage visible to scientist

                                                                                                                                                                      bull Computes ET in our initial use

                                                                                                                                                                      Analysis reduction stage

                                                                                                                                                                      bull Optional second stage visible to

                                                                                                                                                                      scientist

                                                                                                                                                                      bull Enables production of science

                                                                                                                                                                      analysis artifacts such as maps

                                                                                                                                                                      tables virtual sensors

                                                                                                                                                                      Reduction 1

                                                                                                                                                                      Queue

                                                                                                                                                                      Source

                                                                                                                                                                      Metadata

                                                                                                                                                                      AzureMODIS

                                                                                                                                                                      Service Web Role Portal

                                                                                                                                                                      Request

                                                                                                                                                                      Queue

                                                                                                                                                                      Scientific

                                                                                                                                                                      Results

                                                                                                                                                                      Download

                                                                                                                                                                      Data Collection Stage

                                                                                                                                                                      Source Imagery Download Sites

                                                                                                                                                                      Reprojection

                                                                                                                                                                      Queue

                                                                                                                                                                      Reduction 2

                                                                                                                                                                      Queue

                                                                                                                                                                      Download

                                                                                                                                                                      Queue

                                                                                                                                                                      Scientists

                                                                                                                                                                      Science results

                                                                                                                                                                      Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                                                      httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                                                                                                      bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                                                                                                      bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                                                                                                      bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                                                                                                      recoverable units of work

                                                                                                                                                                      bull Execution status of all jobs and tasks persisted in Tables

                                                                                                                                                                      ltPipelineStagegt

                                                                                                                                                                      Request

                                                                                                                                                                      hellip ltPipelineStagegtJobStatus

                                                                                                                                                                      Persist ltPipelineStagegtJob Queue

                                                                                                                                                                      MODISAzure Service

                                                                                                                                                                      (Web Role)

                                                                                                                                                                      Service Monitor

                                                                                                                                                                      (Worker Role)

                                                                                                                                                                      Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                                                      hellip

                                                                                                                                                                      Dispatch

                                                                                                                                                                      ltPipelineStagegtTask Queue

                                                                                                                                                                      All work actually done by a Worker Role

                                                                                                                                                                      bull Sandboxes science or other executable

                                                                                                                                                                      bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                                                                                                      Service Monitor

                                                                                                                                                                      (Worker Role)

                                                                                                                                                                      Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                                                      GenericWorker

                                                                                                                                                                      (Worker Role)

                                                                                                                                                                      hellip

                                                                                                                                                                      hellip

                                                                                                                                                                      Dispatch

                                                                                                                                                                      ltPipelineStagegtTask Queue

                                                                                                                                                                      hellip

                                                                                                                                                                      ltInputgtData Storage

                                                                                                                                                                      bull Dequeues tasks created by the Service Monitor

                                                                                                                                                                      bull Retries failed tasks 3 times

                                                                                                                                                                      bull Maintains all task status

                                                                                                                                                                      Reprojection Request

                                                                                                                                                                      hellip

                                                                                                                                                                      Service Monitor

                                                                                                                                                                      (Worker Role)

                                                                                                                                                                      ReprojectionJobStatus Persist

                                                                                                                                                                      Parse amp Persist ReprojectionTaskStatus

                                                                                                                                                                      GenericWorker

                                                                                                                                                                      (Worker Role)

                                                                                                                                                                      hellip

                                                                                                                                                                      Job Queue

                                                                                                                                                                      hellip

                                                                                                                                                                      Dispatch

                                                                                                                                                                      Task Queue

                                                                                                                                                                      Points to

                                                                                                                                                                      hellip

                                                                                                                                                                      ScanTimeList

                                                                                                                                                                      SwathGranuleMeta

                                                                                                                                                                      Reprojection Data

                                                                                                                                                                      Storage

                                                                                                                                                                      Each entity specifies a

                                                                                                                                                                      single reprojection job

                                                                                                                                                                      request

                                                                                                                                                                      Each entity specifies a

                                                                                                                                                                      single reprojection task (ie

                                                                                                                                                                      a single tile)

                                                                                                                                                                      Query this table to get

                                                                                                                                                                      geo-metadata (eg

                                                                                                                                                                      boundaries) for each swath

                                                                                                                                                                      tile

                                                                                                                                                                      Query this table to get the

                                                                                                                                                                      list of satellite scan times

                                                                                                                                                                      that cover a target tile

                                                                                                                                                                      Swath Source

                                                                                                                                                                      Data Storage

                                                                                                                                                                      bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                                                                                                      bull Storage costs driven by data scale and 6 month project duration

                                                                                                                                                                      bull Small with respect to the people costs even at graduate student rates

                                                                                                                                                                      Reduction 1 Queue

                                                                                                                                                                      Source Metadata

                                                                                                                                                                      Request Queue

                                                                                                                                                                      Scientific Results Download

                                                                                                                                                                      Data Collection Stage

                                                                                                                                                                      Source Imagery Download Sites

                                                                                                                                                                      Reprojection Queue

                                                                                                                                                                      Reduction 2 Queue

                                                                                                                                                                      Download

                                                                                                                                                                      Queue

                                                                                                                                                                      Scientists

                                                                                                                                                                      Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                                                      400-500 GB

                                                                                                                                                                      60K files

                                                                                                                                                                      10 MBsec

                                                                                                                                                                      11 hours

                                                                                                                                                                      lt10 workers

                                                                                                                                                                      $50 upload

                                                                                                                                                                      $450 storage

                                                                                                                                                                      400 GB

                                                                                                                                                                      45K files

                                                                                                                                                                      3500 hours

                                                                                                                                                                      20-100

                                                                                                                                                                      workers

                                                                                                                                                                      5-7 GB

                                                                                                                                                                      55K files

                                                                                                                                                                      1800 hours

                                                                                                                                                                      20-100

                                                                                                                                                                      workers

                                                                                                                                                                      lt10 GB

                                                                                                                                                                      ~1K files

                                                                                                                                                                      1800 hours

                                                                                                                                                                      20-100

                                                                                                                                                                      workers

                                                                                                                                                                      $420 cpu

                                                                                                                                                                      $60 download

                                                                                                                                                                      $216 cpu

                                                                                                                                                                      $1 download

                                                                                                                                                                      $6 storage

                                                                                                                                                                      $216 cpu

                                                                                                                                                                      $2 download

                                                                                                                                                                      $9 storage

                                                                                                                                                                      AzureMODIS

                                                                                                                                                                      Service Web Role Portal

                                                                                                                                                                      Total $1420

                                                                                                                                                                      bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                                                                                                      potential to be important to both large and small scale science problems

                                                                                                                                                                      bull Equally import they can increase participation in research providing needed

                                                                                                                                                                      resources to userscommunities without ready access

                                                                                                                                                                      bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                                                                                                      support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                                                                                                      latency applications do not perform optimally on clouds today

                                                                                                                                                                      bull Provide valuable fault tolerance and scalability abstractions

                                                                                                                                                                      bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                                                                                                      bull Clouds services to support research provide considerable leverage for both

                                                                                                                                                                      individual researchers and entire communities of researchers

                                                                                                                                                                      • Day 2 - Azure as a PaaS
                                                                                                                                                                      • Day 2 - Applications

                                                                                                                                                                        ASPNET program hosted by a web role instance bull Submit jobs

                                                                                                                                                                        bull Track jobrsquos status and logs

                                                                                                                                                                        AuthenticationAuthorization based on Live ID

                                                                                                                                                                        The accepted job is stored into the job registry table bull Fault tolerance avoid in-memory states

                                                                                                                                                                        Web Portal

                                                                                                                                                                        Web Service

                                                                                                                                                                        Job registration

                                                                                                                                                                        Job Scheduler

                                                                                                                                                                        Job Portal

                                                                                                                                                                        Scaling Engine

                                                                                                                                                                        Job Registry

                                                                                                                                                                        R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                                                                                                                                        Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                                                                                                                                        bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                                                                                                                                        AzureBLAST significantly saved computing timehellip

                                                                                                                                                                        Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                                                                                                                                        ldquoAll against Allrdquo query bull The database is also the input query

                                                                                                                                                                        bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                                                                                                                                        bull Theoretically 100 billion sequence comparisons

                                                                                                                                                                        Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                                                                                                                                        bull Would require 3216731 minutes (61 years) on one desktop

                                                                                                                                                                        One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                                                                                                                                        bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                                                                                                                        bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                                                                                                                        bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                                                                                                                        bull Each segment consists of smaller partitions

                                                                                                                                                                        bull When load imbalances redistribute the load manually

                                                                                                                                                                        5

                                                                                                                                                                        0

                                                                                                                                                                        62 6

                                                                                                                                                                        2 6

                                                                                                                                                                        2 6

                                                                                                                                                                        2 6

                                                                                                                                                                        2 5

                                                                                                                                                                        0 62

                                                                                                                                                                        bull Total size of the output result is ~230GB

                                                                                                                                                                        bull The number of total hits is 1764579487

                                                                                                                                                                        bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                                                                                                                        bull But based our estimates real working instance time should be 6~8 day

                                                                                                                                                                        bull Look into log data to analyze what took placehellip

                                                                                                                                                                        5

                                                                                                                                                                        0

                                                                                                                                                                        62 6

                                                                                                                                                                        2 6

                                                                                                                                                                        2 6

                                                                                                                                                                        2 6

                                                                                                                                                                        2 5

                                                                                                                                                                        0 62

                                                                                                                                                                        A normal log record should be

                                                                                                                                                                        Otherwise something is wrong (eg task failed to complete)

                                                                                                                                                                        3312010 614 RD00155D3611B0 Executing the task 251523

                                                                                                                                                                        3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                                                                                                                        3312010 625 RD00155D3611B0 Executing the task 251553

                                                                                                                                                                        3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                                                                                                                        3312010 644 RD00155D3611B0 Executing the task 251600

                                                                                                                                                                        3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                                                                                                                        3312010 822 RD00155D3611B0 Executing the task 251774

                                                                                                                                                                        3312010 950 RD00155D3611B0 Executing the task 251895

                                                                                                                                                                        3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                                                                                                                        North Europe Data Center totally 34256 tasks processed

                                                                                                                                                                        All 62 compute nodes lost tasks

                                                                                                                                                                        and then came back in a group

                                                                                                                                                                        This is an Update domain

                                                                                                                                                                        ~30 mins

                                                                                                                                                                        ~ 6 nodes in one group

                                                                                                                                                                        35 Nodes experience blob

                                                                                                                                                                        writing failure at same

                                                                                                                                                                        time

                                                                                                                                                                        West Europe Datacenter 30976 tasks are completed and job was killed

                                                                                                                                                                        A reasonable guess the

                                                                                                                                                                        Fault Domain is working

                                                                                                                                                                        MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                                                                                                                        You never miss the water till the well has run dry Irish Proverb

                                                                                                                                                                        ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                                                                                                                        Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                                                                                                                        λv = Latent heat of vaporization (Jg)

                                                                                                                                                                        Rn = Net radiation (W m-2)

                                                                                                                                                                        cp = Specific heat capacity of air (J kg-1 K-1)

                                                                                                                                                                        ρa = dry air density (kg m-3)

                                                                                                                                                                        δq = vapor pressure deficit (Pa)

                                                                                                                                                                        ga = Conductivity of air (inverse of ra) (m s-1)

                                                                                                                                                                        gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                                                                                                                        γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                                                                                                                        Estimating resistanceconductivity across a

                                                                                                                                                                        catchment can be tricky

                                                                                                                                                                        bull Lots of inputs big data reduction

                                                                                                                                                                        bull Some of the inputs are not so simple

                                                                                                                                                                        119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                                                                                                                        (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                                                                                                                        Penman-Monteith (1964)

                                                                                                                                                                        Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                                                                                                                        NASA MODIS

                                                                                                                                                                        imagery source

                                                                                                                                                                        archives

                                                                                                                                                                        5 TB (600K files)

                                                                                                                                                                        FLUXNET curated

                                                                                                                                                                        sensor dataset

                                                                                                                                                                        (30GB 960 files)

                                                                                                                                                                        FLUXNET curated

                                                                                                                                                                        field dataset

                                                                                                                                                                        2 KB (1 file)

                                                                                                                                                                        NCEPNCAR

                                                                                                                                                                        ~100MB

                                                                                                                                                                        (4K files)

                                                                                                                                                                        Vegetative

                                                                                                                                                                        clumping

                                                                                                                                                                        ~5MB (1file)

                                                                                                                                                                        Climate

                                                                                                                                                                        classification

                                                                                                                                                                        ~1MB (1file)

                                                                                                                                                                        20 US year = 1 global year

                                                                                                                                                                        Data collection (map) stage

                                                                                                                                                                        bull Downloads requested input tiles

                                                                                                                                                                        from NASA ftp sites

                                                                                                                                                                        bull Includes geospatial lookup for

                                                                                                                                                                        non-sinusoidal tiles that will

                                                                                                                                                                        contribute to a reprojected

                                                                                                                                                                        sinusoidal tile

                                                                                                                                                                        Reprojection (map) stage

                                                                                                                                                                        bull Converts source tile(s) to

                                                                                                                                                                        intermediate result sinusoidal tiles

                                                                                                                                                                        bull Simple nearest neighbor or spline

                                                                                                                                                                        algorithms

                                                                                                                                                                        Derivation reduction stage

                                                                                                                                                                        bull First stage visible to scientist

                                                                                                                                                                        bull Computes ET in our initial use

                                                                                                                                                                        Analysis reduction stage

                                                                                                                                                                        bull Optional second stage visible to

                                                                                                                                                                        scientist

                                                                                                                                                                        bull Enables production of science

                                                                                                                                                                        analysis artifacts such as maps

                                                                                                                                                                        tables virtual sensors

                                                                                                                                                                        Reduction 1

                                                                                                                                                                        Queue

                                                                                                                                                                        Source

                                                                                                                                                                        Metadata

                                                                                                                                                                        AzureMODIS

                                                                                                                                                                        Service Web Role Portal

                                                                                                                                                                        Request

                                                                                                                                                                        Queue

                                                                                                                                                                        Scientific

                                                                                                                                                                        Results

                                                                                                                                                                        Download

                                                                                                                                                                        Data Collection Stage

                                                                                                                                                                        Source Imagery Download Sites

                                                                                                                                                                        Reprojection

                                                                                                                                                                        Queue

                                                                                                                                                                        Reduction 2

                                                                                                                                                                        Queue

                                                                                                                                                                        Download

                                                                                                                                                                        Queue

                                                                                                                                                                        Scientists

                                                                                                                                                                        Science results

                                                                                                                                                                        Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                                                        httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                                                                                                        bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                                                                                                        bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                                                                                                        bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                                                                                                        recoverable units of work

                                                                                                                                                                        bull Execution status of all jobs and tasks persisted in Tables

                                                                                                                                                                        ltPipelineStagegt

                                                                                                                                                                        Request

                                                                                                                                                                        hellip ltPipelineStagegtJobStatus

                                                                                                                                                                        Persist ltPipelineStagegtJob Queue

                                                                                                                                                                        MODISAzure Service

                                                                                                                                                                        (Web Role)

                                                                                                                                                                        Service Monitor

                                                                                                                                                                        (Worker Role)

                                                                                                                                                                        Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                                                        hellip

                                                                                                                                                                        Dispatch

                                                                                                                                                                        ltPipelineStagegtTask Queue

                                                                                                                                                                        All work actually done by a Worker Role

                                                                                                                                                                        bull Sandboxes science or other executable

                                                                                                                                                                        bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                                                                                                        Service Monitor

                                                                                                                                                                        (Worker Role)

                                                                                                                                                                        Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                                                        GenericWorker

                                                                                                                                                                        (Worker Role)

                                                                                                                                                                        hellip

                                                                                                                                                                        hellip

                                                                                                                                                                        Dispatch

                                                                                                                                                                        ltPipelineStagegtTask Queue

                                                                                                                                                                        hellip

                                                                                                                                                                        ltInputgtData Storage

                                                                                                                                                                        bull Dequeues tasks created by the Service Monitor

                                                                                                                                                                        bull Retries failed tasks 3 times

                                                                                                                                                                        bull Maintains all task status

                                                                                                                                                                        Reprojection Request

                                                                                                                                                                        hellip

                                                                                                                                                                        Service Monitor

                                                                                                                                                                        (Worker Role)

                                                                                                                                                                        ReprojectionJobStatus Persist

                                                                                                                                                                        Parse amp Persist ReprojectionTaskStatus

                                                                                                                                                                        GenericWorker

                                                                                                                                                                        (Worker Role)

                                                                                                                                                                        hellip

                                                                                                                                                                        Job Queue

                                                                                                                                                                        hellip

                                                                                                                                                                        Dispatch

                                                                                                                                                                        Task Queue

                                                                                                                                                                        Points to

                                                                                                                                                                        hellip

                                                                                                                                                                        ScanTimeList

                                                                                                                                                                        SwathGranuleMeta

                                                                                                                                                                        Reprojection Data

                                                                                                                                                                        Storage

                                                                                                                                                                        Each entity specifies a

                                                                                                                                                                        single reprojection job

                                                                                                                                                                        request

                                                                                                                                                                        Each entity specifies a

                                                                                                                                                                        single reprojection task (ie

                                                                                                                                                                        a single tile)

                                                                                                                                                                        Query this table to get

                                                                                                                                                                        geo-metadata (eg

                                                                                                                                                                        boundaries) for each swath

                                                                                                                                                                        tile

                                                                                                                                                                        Query this table to get the

                                                                                                                                                                        list of satellite scan times

                                                                                                                                                                        that cover a target tile

                                                                                                                                                                        Swath Source

                                                                                                                                                                        Data Storage

                                                                                                                                                                        bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                                                                                                        bull Storage costs driven by data scale and 6 month project duration

                                                                                                                                                                        bull Small with respect to the people costs even at graduate student rates

                                                                                                                                                                        Reduction 1 Queue

                                                                                                                                                                        Source Metadata

                                                                                                                                                                        Request Queue

                                                                                                                                                                        Scientific Results Download

                                                                                                                                                                        Data Collection Stage

                                                                                                                                                                        Source Imagery Download Sites

                                                                                                                                                                        Reprojection Queue

                                                                                                                                                                        Reduction 2 Queue

                                                                                                                                                                        Download

                                                                                                                                                                        Queue

                                                                                                                                                                        Scientists

                                                                                                                                                                        Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                                                        400-500 GB

                                                                                                                                                                        60K files

                                                                                                                                                                        10 MBsec

                                                                                                                                                                        11 hours

                                                                                                                                                                        lt10 workers

                                                                                                                                                                        $50 upload

                                                                                                                                                                        $450 storage

                                                                                                                                                                        400 GB

                                                                                                                                                                        45K files

                                                                                                                                                                        3500 hours

                                                                                                                                                                        20-100

                                                                                                                                                                        workers

                                                                                                                                                                        5-7 GB

                                                                                                                                                                        55K files

                                                                                                                                                                        1800 hours

                                                                                                                                                                        20-100

                                                                                                                                                                        workers

                                                                                                                                                                        lt10 GB

                                                                                                                                                                        ~1K files

                                                                                                                                                                        1800 hours

                                                                                                                                                                        20-100

                                                                                                                                                                        workers

                                                                                                                                                                        $420 cpu

                                                                                                                                                                        $60 download

                                                                                                                                                                        $216 cpu

                                                                                                                                                                        $1 download

                                                                                                                                                                        $6 storage

                                                                                                                                                                        $216 cpu

                                                                                                                                                                        $2 download

                                                                                                                                                                        $9 storage

                                                                                                                                                                        AzureMODIS

                                                                                                                                                                        Service Web Role Portal

                                                                                                                                                                        Total $1420

                                                                                                                                                                        bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                                                                                                        potential to be important to both large and small scale science problems

                                                                                                                                                                        bull Equally import they can increase participation in research providing needed

                                                                                                                                                                        resources to userscommunities without ready access

                                                                                                                                                                        bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                                                                                                        support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                                                                                                        latency applications do not perform optimally on clouds today

                                                                                                                                                                        bull Provide valuable fault tolerance and scalability abstractions

                                                                                                                                                                        bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                                                                                                        bull Clouds services to support research provide considerable leverage for both

                                                                                                                                                                        individual researchers and entire communities of researchers

                                                                                                                                                                        • Day 2 - Azure as a PaaS
                                                                                                                                                                        • Day 2 - Applications

                                                                                                                                                                          R palustris as a platform for H2 production Eric Shadt SAGE Sam Phattarasukol Harwood Lab UW

                                                                                                                                                                          Blasted ~5000 proteins (700K sequences) bull Against all NCBI non-redundant proteins completed in 30 min

                                                                                                                                                                          bull Against ~5000 proteins from another strain completed in less than 30 sec

                                                                                                                                                                          AzureBLAST significantly saved computing timehellip

                                                                                                                                                                          Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                                                                                                                                          ldquoAll against Allrdquo query bull The database is also the input query

                                                                                                                                                                          bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                                                                                                                                          bull Theoretically 100 billion sequence comparisons

                                                                                                                                                                          Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                                                                                                                                          bull Would require 3216731 minutes (61 years) on one desktop

                                                                                                                                                                          One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                                                                                                                                          bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                                                                                                                          bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                                                                                                                          bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                                                                                                                          bull Each segment consists of smaller partitions

                                                                                                                                                                          bull When load imbalances redistribute the load manually

                                                                                                                                                                          5

                                                                                                                                                                          0

                                                                                                                                                                          62 6

                                                                                                                                                                          2 6

                                                                                                                                                                          2 6

                                                                                                                                                                          2 6

                                                                                                                                                                          2 5

                                                                                                                                                                          0 62

                                                                                                                                                                          bull Total size of the output result is ~230GB

                                                                                                                                                                          bull The number of total hits is 1764579487

                                                                                                                                                                          bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                                                                                                                          bull But based our estimates real working instance time should be 6~8 day

                                                                                                                                                                          bull Look into log data to analyze what took placehellip

                                                                                                                                                                          5

                                                                                                                                                                          0

                                                                                                                                                                          62 6

                                                                                                                                                                          2 6

                                                                                                                                                                          2 6

                                                                                                                                                                          2 6

                                                                                                                                                                          2 5

                                                                                                                                                                          0 62

                                                                                                                                                                          A normal log record should be

                                                                                                                                                                          Otherwise something is wrong (eg task failed to complete)

                                                                                                                                                                          3312010 614 RD00155D3611B0 Executing the task 251523

                                                                                                                                                                          3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                                                                                                                          3312010 625 RD00155D3611B0 Executing the task 251553

                                                                                                                                                                          3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                                                                                                                          3312010 644 RD00155D3611B0 Executing the task 251600

                                                                                                                                                                          3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                                                                                                                          3312010 822 RD00155D3611B0 Executing the task 251774

                                                                                                                                                                          3312010 950 RD00155D3611B0 Executing the task 251895

                                                                                                                                                                          3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                                                                                                                          North Europe Data Center totally 34256 tasks processed

                                                                                                                                                                          All 62 compute nodes lost tasks

                                                                                                                                                                          and then came back in a group

                                                                                                                                                                          This is an Update domain

                                                                                                                                                                          ~30 mins

                                                                                                                                                                          ~ 6 nodes in one group

                                                                                                                                                                          35 Nodes experience blob

                                                                                                                                                                          writing failure at same

                                                                                                                                                                          time

                                                                                                                                                                          West Europe Datacenter 30976 tasks are completed and job was killed

                                                                                                                                                                          A reasonable guess the

                                                                                                                                                                          Fault Domain is working

                                                                                                                                                                          MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                                                                                                                          You never miss the water till the well has run dry Irish Proverb

                                                                                                                                                                          ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                                                                                                                          Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                                                                                                                          λv = Latent heat of vaporization (Jg)

                                                                                                                                                                          Rn = Net radiation (W m-2)

                                                                                                                                                                          cp = Specific heat capacity of air (J kg-1 K-1)

                                                                                                                                                                          ρa = dry air density (kg m-3)

                                                                                                                                                                          δq = vapor pressure deficit (Pa)

                                                                                                                                                                          ga = Conductivity of air (inverse of ra) (m s-1)

                                                                                                                                                                          gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                                                                                                                          γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                                                                                                                          Estimating resistanceconductivity across a

                                                                                                                                                                          catchment can be tricky

                                                                                                                                                                          bull Lots of inputs big data reduction

                                                                                                                                                                          bull Some of the inputs are not so simple

                                                                                                                                                                          119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                                                                                                                          (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                                                                                                                          Penman-Monteith (1964)

                                                                                                                                                                          Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                                                                                                                          NASA MODIS

                                                                                                                                                                          imagery source

                                                                                                                                                                          archives

                                                                                                                                                                          5 TB (600K files)

                                                                                                                                                                          FLUXNET curated

                                                                                                                                                                          sensor dataset

                                                                                                                                                                          (30GB 960 files)

                                                                                                                                                                          FLUXNET curated

                                                                                                                                                                          field dataset

                                                                                                                                                                          2 KB (1 file)

                                                                                                                                                                          NCEPNCAR

                                                                                                                                                                          ~100MB

                                                                                                                                                                          (4K files)

                                                                                                                                                                          Vegetative

                                                                                                                                                                          clumping

                                                                                                                                                                          ~5MB (1file)

                                                                                                                                                                          Climate

                                                                                                                                                                          classification

                                                                                                                                                                          ~1MB (1file)

                                                                                                                                                                          20 US year = 1 global year

                                                                                                                                                                          Data collection (map) stage

                                                                                                                                                                          bull Downloads requested input tiles

                                                                                                                                                                          from NASA ftp sites

                                                                                                                                                                          bull Includes geospatial lookup for

                                                                                                                                                                          non-sinusoidal tiles that will

                                                                                                                                                                          contribute to a reprojected

                                                                                                                                                                          sinusoidal tile

                                                                                                                                                                          Reprojection (map) stage

                                                                                                                                                                          bull Converts source tile(s) to

                                                                                                                                                                          intermediate result sinusoidal tiles

                                                                                                                                                                          bull Simple nearest neighbor or spline

                                                                                                                                                                          algorithms

                                                                                                                                                                          Derivation reduction stage

                                                                                                                                                                          bull First stage visible to scientist

                                                                                                                                                                          bull Computes ET in our initial use

                                                                                                                                                                          Analysis reduction stage

                                                                                                                                                                          bull Optional second stage visible to

                                                                                                                                                                          scientist

                                                                                                                                                                          bull Enables production of science

                                                                                                                                                                          analysis artifacts such as maps

                                                                                                                                                                          tables virtual sensors

                                                                                                                                                                          Reduction 1

                                                                                                                                                                          Queue

                                                                                                                                                                          Source

                                                                                                                                                                          Metadata

                                                                                                                                                                          AzureMODIS

                                                                                                                                                                          Service Web Role Portal

                                                                                                                                                                          Request

                                                                                                                                                                          Queue

                                                                                                                                                                          Scientific

                                                                                                                                                                          Results

                                                                                                                                                                          Download

                                                                                                                                                                          Data Collection Stage

                                                                                                                                                                          Source Imagery Download Sites

                                                                                                                                                                          Reprojection

                                                                                                                                                                          Queue

                                                                                                                                                                          Reduction 2

                                                                                                                                                                          Queue

                                                                                                                                                                          Download

                                                                                                                                                                          Queue

                                                                                                                                                                          Scientists

                                                                                                                                                                          Science results

                                                                                                                                                                          Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                                                          httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                                                                                                          bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                                                                                                          bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                                                                                                          bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                                                                                                          recoverable units of work

                                                                                                                                                                          bull Execution status of all jobs and tasks persisted in Tables

                                                                                                                                                                          ltPipelineStagegt

                                                                                                                                                                          Request

                                                                                                                                                                          hellip ltPipelineStagegtJobStatus

                                                                                                                                                                          Persist ltPipelineStagegtJob Queue

                                                                                                                                                                          MODISAzure Service

                                                                                                                                                                          (Web Role)

                                                                                                                                                                          Service Monitor

                                                                                                                                                                          (Worker Role)

                                                                                                                                                                          Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                                                          hellip

                                                                                                                                                                          Dispatch

                                                                                                                                                                          ltPipelineStagegtTask Queue

                                                                                                                                                                          All work actually done by a Worker Role

                                                                                                                                                                          bull Sandboxes science or other executable

                                                                                                                                                                          bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                                                                                                          Service Monitor

                                                                                                                                                                          (Worker Role)

                                                                                                                                                                          Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                                                          GenericWorker

                                                                                                                                                                          (Worker Role)

                                                                                                                                                                          hellip

                                                                                                                                                                          hellip

                                                                                                                                                                          Dispatch

                                                                                                                                                                          ltPipelineStagegtTask Queue

                                                                                                                                                                          hellip

                                                                                                                                                                          ltInputgtData Storage

                                                                                                                                                                          bull Dequeues tasks created by the Service Monitor

                                                                                                                                                                          bull Retries failed tasks 3 times

                                                                                                                                                                          bull Maintains all task status

                                                                                                                                                                          Reprojection Request

                                                                                                                                                                          hellip

                                                                                                                                                                          Service Monitor

                                                                                                                                                                          (Worker Role)

                                                                                                                                                                          ReprojectionJobStatus Persist

                                                                                                                                                                          Parse amp Persist ReprojectionTaskStatus

                                                                                                                                                                          GenericWorker

                                                                                                                                                                          (Worker Role)

                                                                                                                                                                          hellip

                                                                                                                                                                          Job Queue

                                                                                                                                                                          hellip

                                                                                                                                                                          Dispatch

                                                                                                                                                                          Task Queue

                                                                                                                                                                          Points to

                                                                                                                                                                          hellip

                                                                                                                                                                          ScanTimeList

                                                                                                                                                                          SwathGranuleMeta

                                                                                                                                                                          Reprojection Data

                                                                                                                                                                          Storage

                                                                                                                                                                          Each entity specifies a

                                                                                                                                                                          single reprojection job

                                                                                                                                                                          request

                                                                                                                                                                          Each entity specifies a

                                                                                                                                                                          single reprojection task (ie

                                                                                                                                                                          a single tile)

                                                                                                                                                                          Query this table to get

                                                                                                                                                                          geo-metadata (eg

                                                                                                                                                                          boundaries) for each swath

                                                                                                                                                                          tile

                                                                                                                                                                          Query this table to get the

                                                                                                                                                                          list of satellite scan times

                                                                                                                                                                          that cover a target tile

                                                                                                                                                                          Swath Source

                                                                                                                                                                          Data Storage

                                                                                                                                                                          bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                                                                                                          bull Storage costs driven by data scale and 6 month project duration

                                                                                                                                                                          bull Small with respect to the people costs even at graduate student rates

                                                                                                                                                                          Reduction 1 Queue

                                                                                                                                                                          Source Metadata

                                                                                                                                                                          Request Queue

                                                                                                                                                                          Scientific Results Download

                                                                                                                                                                          Data Collection Stage

                                                                                                                                                                          Source Imagery Download Sites

                                                                                                                                                                          Reprojection Queue

                                                                                                                                                                          Reduction 2 Queue

                                                                                                                                                                          Download

                                                                                                                                                                          Queue

                                                                                                                                                                          Scientists

                                                                                                                                                                          Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                                                          400-500 GB

                                                                                                                                                                          60K files

                                                                                                                                                                          10 MBsec

                                                                                                                                                                          11 hours

                                                                                                                                                                          lt10 workers

                                                                                                                                                                          $50 upload

                                                                                                                                                                          $450 storage

                                                                                                                                                                          400 GB

                                                                                                                                                                          45K files

                                                                                                                                                                          3500 hours

                                                                                                                                                                          20-100

                                                                                                                                                                          workers

                                                                                                                                                                          5-7 GB

                                                                                                                                                                          55K files

                                                                                                                                                                          1800 hours

                                                                                                                                                                          20-100

                                                                                                                                                                          workers

                                                                                                                                                                          lt10 GB

                                                                                                                                                                          ~1K files

                                                                                                                                                                          1800 hours

                                                                                                                                                                          20-100

                                                                                                                                                                          workers

                                                                                                                                                                          $420 cpu

                                                                                                                                                                          $60 download

                                                                                                                                                                          $216 cpu

                                                                                                                                                                          $1 download

                                                                                                                                                                          $6 storage

                                                                                                                                                                          $216 cpu

                                                                                                                                                                          $2 download

                                                                                                                                                                          $9 storage

                                                                                                                                                                          AzureMODIS

                                                                                                                                                                          Service Web Role Portal

                                                                                                                                                                          Total $1420

                                                                                                                                                                          bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                                                                                                          potential to be important to both large and small scale science problems

                                                                                                                                                                          bull Equally import they can increase participation in research providing needed

                                                                                                                                                                          resources to userscommunities without ready access

                                                                                                                                                                          bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                                                                                                          support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                                                                                                          latency applications do not perform optimally on clouds today

                                                                                                                                                                          bull Provide valuable fault tolerance and scalability abstractions

                                                                                                                                                                          bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                                                                                                          bull Clouds services to support research provide considerable leverage for both

                                                                                                                                                                          individual researchers and entire communities of researchers

                                                                                                                                                                          • Day 2 - Azure as a PaaS
                                                                                                                                                                          • Day 2 - Applications

                                                                                                                                                                            Discovering Homologs bull Discover the interrelationships of known protein sequences

                                                                                                                                                                            ldquoAll against Allrdquo query bull The database is also the input query

                                                                                                                                                                            bull The protein database is large (42 GB size) bull Totally 9865668 sequences to be queried

                                                                                                                                                                            bull Theoretically 100 billion sequence comparisons

                                                                                                                                                                            Performance estimation bull Based on the sampling-running on one extra-large Azure instance

                                                                                                                                                                            bull Would require 3216731 minutes (61 years) on one desktop

                                                                                                                                                                            One of biggest BLAST jobs as far as we know bull This scale of experiments usually are infeasible to most scientists

                                                                                                                                                                            bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                                                                                                                            bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                                                                                                                            bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                                                                                                                            bull Each segment consists of smaller partitions

                                                                                                                                                                            bull When load imbalances redistribute the load manually

                                                                                                                                                                            5

                                                                                                                                                                            0

                                                                                                                                                                            62 6

                                                                                                                                                                            2 6

                                                                                                                                                                            2 6

                                                                                                                                                                            2 6

                                                                                                                                                                            2 5

                                                                                                                                                                            0 62

                                                                                                                                                                            bull Total size of the output result is ~230GB

                                                                                                                                                                            bull The number of total hits is 1764579487

                                                                                                                                                                            bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                                                                                                                            bull But based our estimates real working instance time should be 6~8 day

                                                                                                                                                                            bull Look into log data to analyze what took placehellip

                                                                                                                                                                            5

                                                                                                                                                                            0

                                                                                                                                                                            62 6

                                                                                                                                                                            2 6

                                                                                                                                                                            2 6

                                                                                                                                                                            2 6

                                                                                                                                                                            2 5

                                                                                                                                                                            0 62

                                                                                                                                                                            A normal log record should be

                                                                                                                                                                            Otherwise something is wrong (eg task failed to complete)

                                                                                                                                                                            3312010 614 RD00155D3611B0 Executing the task 251523

                                                                                                                                                                            3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                                                                                                                            3312010 625 RD00155D3611B0 Executing the task 251553

                                                                                                                                                                            3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                                                                                                                            3312010 644 RD00155D3611B0 Executing the task 251600

                                                                                                                                                                            3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                                                                                                                            3312010 822 RD00155D3611B0 Executing the task 251774

                                                                                                                                                                            3312010 950 RD00155D3611B0 Executing the task 251895

                                                                                                                                                                            3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                                                                                                                            North Europe Data Center totally 34256 tasks processed

                                                                                                                                                                            All 62 compute nodes lost tasks

                                                                                                                                                                            and then came back in a group

                                                                                                                                                                            This is an Update domain

                                                                                                                                                                            ~30 mins

                                                                                                                                                                            ~ 6 nodes in one group

                                                                                                                                                                            35 Nodes experience blob

                                                                                                                                                                            writing failure at same

                                                                                                                                                                            time

                                                                                                                                                                            West Europe Datacenter 30976 tasks are completed and job was killed

                                                                                                                                                                            A reasonable guess the

                                                                                                                                                                            Fault Domain is working

                                                                                                                                                                            MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                                                                                                                            You never miss the water till the well has run dry Irish Proverb

                                                                                                                                                                            ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                                                                                                                            Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                                                                                                                            λv = Latent heat of vaporization (Jg)

                                                                                                                                                                            Rn = Net radiation (W m-2)

                                                                                                                                                                            cp = Specific heat capacity of air (J kg-1 K-1)

                                                                                                                                                                            ρa = dry air density (kg m-3)

                                                                                                                                                                            δq = vapor pressure deficit (Pa)

                                                                                                                                                                            ga = Conductivity of air (inverse of ra) (m s-1)

                                                                                                                                                                            gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                                                                                                                            γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                                                                                                                            Estimating resistanceconductivity across a

                                                                                                                                                                            catchment can be tricky

                                                                                                                                                                            bull Lots of inputs big data reduction

                                                                                                                                                                            bull Some of the inputs are not so simple

                                                                                                                                                                            119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                                                                                                                            (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                                                                                                                            Penman-Monteith (1964)

                                                                                                                                                                            Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                                                                                                                            NASA MODIS

                                                                                                                                                                            imagery source

                                                                                                                                                                            archives

                                                                                                                                                                            5 TB (600K files)

                                                                                                                                                                            FLUXNET curated

                                                                                                                                                                            sensor dataset

                                                                                                                                                                            (30GB 960 files)

                                                                                                                                                                            FLUXNET curated

                                                                                                                                                                            field dataset

                                                                                                                                                                            2 KB (1 file)

                                                                                                                                                                            NCEPNCAR

                                                                                                                                                                            ~100MB

                                                                                                                                                                            (4K files)

                                                                                                                                                                            Vegetative

                                                                                                                                                                            clumping

                                                                                                                                                                            ~5MB (1file)

                                                                                                                                                                            Climate

                                                                                                                                                                            classification

                                                                                                                                                                            ~1MB (1file)

                                                                                                                                                                            20 US year = 1 global year

                                                                                                                                                                            Data collection (map) stage

                                                                                                                                                                            bull Downloads requested input tiles

                                                                                                                                                                            from NASA ftp sites

                                                                                                                                                                            bull Includes geospatial lookup for

                                                                                                                                                                            non-sinusoidal tiles that will

                                                                                                                                                                            contribute to a reprojected

                                                                                                                                                                            sinusoidal tile

                                                                                                                                                                            Reprojection (map) stage

                                                                                                                                                                            bull Converts source tile(s) to

                                                                                                                                                                            intermediate result sinusoidal tiles

                                                                                                                                                                            bull Simple nearest neighbor or spline

                                                                                                                                                                            algorithms

                                                                                                                                                                            Derivation reduction stage

                                                                                                                                                                            bull First stage visible to scientist

                                                                                                                                                                            bull Computes ET in our initial use

                                                                                                                                                                            Analysis reduction stage

                                                                                                                                                                            bull Optional second stage visible to

                                                                                                                                                                            scientist

                                                                                                                                                                            bull Enables production of science

                                                                                                                                                                            analysis artifacts such as maps

                                                                                                                                                                            tables virtual sensors

                                                                                                                                                                            Reduction 1

                                                                                                                                                                            Queue

                                                                                                                                                                            Source

                                                                                                                                                                            Metadata

                                                                                                                                                                            AzureMODIS

                                                                                                                                                                            Service Web Role Portal

                                                                                                                                                                            Request

                                                                                                                                                                            Queue

                                                                                                                                                                            Scientific

                                                                                                                                                                            Results

                                                                                                                                                                            Download

                                                                                                                                                                            Data Collection Stage

                                                                                                                                                                            Source Imagery Download Sites

                                                                                                                                                                            Reprojection

                                                                                                                                                                            Queue

                                                                                                                                                                            Reduction 2

                                                                                                                                                                            Queue

                                                                                                                                                                            Download

                                                                                                                                                                            Queue

                                                                                                                                                                            Scientists

                                                                                                                                                                            Science results

                                                                                                                                                                            Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                                                            httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                                                                                                            bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                                                                                                            bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                                                                                                            bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                                                                                                            recoverable units of work

                                                                                                                                                                            bull Execution status of all jobs and tasks persisted in Tables

                                                                                                                                                                            ltPipelineStagegt

                                                                                                                                                                            Request

                                                                                                                                                                            hellip ltPipelineStagegtJobStatus

                                                                                                                                                                            Persist ltPipelineStagegtJob Queue

                                                                                                                                                                            MODISAzure Service

                                                                                                                                                                            (Web Role)

                                                                                                                                                                            Service Monitor

                                                                                                                                                                            (Worker Role)

                                                                                                                                                                            Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                                                            hellip

                                                                                                                                                                            Dispatch

                                                                                                                                                                            ltPipelineStagegtTask Queue

                                                                                                                                                                            All work actually done by a Worker Role

                                                                                                                                                                            bull Sandboxes science or other executable

                                                                                                                                                                            bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                                                                                                            Service Monitor

                                                                                                                                                                            (Worker Role)

                                                                                                                                                                            Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                                                            GenericWorker

                                                                                                                                                                            (Worker Role)

                                                                                                                                                                            hellip

                                                                                                                                                                            hellip

                                                                                                                                                                            Dispatch

                                                                                                                                                                            ltPipelineStagegtTask Queue

                                                                                                                                                                            hellip

                                                                                                                                                                            ltInputgtData Storage

                                                                                                                                                                            bull Dequeues tasks created by the Service Monitor

                                                                                                                                                                            bull Retries failed tasks 3 times

                                                                                                                                                                            bull Maintains all task status

                                                                                                                                                                            Reprojection Request

                                                                                                                                                                            hellip

                                                                                                                                                                            Service Monitor

                                                                                                                                                                            (Worker Role)

                                                                                                                                                                            ReprojectionJobStatus Persist

                                                                                                                                                                            Parse amp Persist ReprojectionTaskStatus

                                                                                                                                                                            GenericWorker

                                                                                                                                                                            (Worker Role)

                                                                                                                                                                            hellip

                                                                                                                                                                            Job Queue

                                                                                                                                                                            hellip

                                                                                                                                                                            Dispatch

                                                                                                                                                                            Task Queue

                                                                                                                                                                            Points to

                                                                                                                                                                            hellip

                                                                                                                                                                            ScanTimeList

                                                                                                                                                                            SwathGranuleMeta

                                                                                                                                                                            Reprojection Data

                                                                                                                                                                            Storage

                                                                                                                                                                            Each entity specifies a

                                                                                                                                                                            single reprojection job

                                                                                                                                                                            request

                                                                                                                                                                            Each entity specifies a

                                                                                                                                                                            single reprojection task (ie

                                                                                                                                                                            a single tile)

                                                                                                                                                                            Query this table to get

                                                                                                                                                                            geo-metadata (eg

                                                                                                                                                                            boundaries) for each swath

                                                                                                                                                                            tile

                                                                                                                                                                            Query this table to get the

                                                                                                                                                                            list of satellite scan times

                                                                                                                                                                            that cover a target tile

                                                                                                                                                                            Swath Source

                                                                                                                                                                            Data Storage

                                                                                                                                                                            bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                                                                                                            bull Storage costs driven by data scale and 6 month project duration

                                                                                                                                                                            bull Small with respect to the people costs even at graduate student rates

                                                                                                                                                                            Reduction 1 Queue

                                                                                                                                                                            Source Metadata

                                                                                                                                                                            Request Queue

                                                                                                                                                                            Scientific Results Download

                                                                                                                                                                            Data Collection Stage

                                                                                                                                                                            Source Imagery Download Sites

                                                                                                                                                                            Reprojection Queue

                                                                                                                                                                            Reduction 2 Queue

                                                                                                                                                                            Download

                                                                                                                                                                            Queue

                                                                                                                                                                            Scientists

                                                                                                                                                                            Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                                                            400-500 GB

                                                                                                                                                                            60K files

                                                                                                                                                                            10 MBsec

                                                                                                                                                                            11 hours

                                                                                                                                                                            lt10 workers

                                                                                                                                                                            $50 upload

                                                                                                                                                                            $450 storage

                                                                                                                                                                            400 GB

                                                                                                                                                                            45K files

                                                                                                                                                                            3500 hours

                                                                                                                                                                            20-100

                                                                                                                                                                            workers

                                                                                                                                                                            5-7 GB

                                                                                                                                                                            55K files

                                                                                                                                                                            1800 hours

                                                                                                                                                                            20-100

                                                                                                                                                                            workers

                                                                                                                                                                            lt10 GB

                                                                                                                                                                            ~1K files

                                                                                                                                                                            1800 hours

                                                                                                                                                                            20-100

                                                                                                                                                                            workers

                                                                                                                                                                            $420 cpu

                                                                                                                                                                            $60 download

                                                                                                                                                                            $216 cpu

                                                                                                                                                                            $1 download

                                                                                                                                                                            $6 storage

                                                                                                                                                                            $216 cpu

                                                                                                                                                                            $2 download

                                                                                                                                                                            $9 storage

                                                                                                                                                                            AzureMODIS

                                                                                                                                                                            Service Web Role Portal

                                                                                                                                                                            Total $1420

                                                                                                                                                                            bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                                                                                                            potential to be important to both large and small scale science problems

                                                                                                                                                                            bull Equally import they can increase participation in research providing needed

                                                                                                                                                                            resources to userscommunities without ready access

                                                                                                                                                                            bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                                                                                                            support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                                                                                                            latency applications do not perform optimally on clouds today

                                                                                                                                                                            bull Provide valuable fault tolerance and scalability abstractions

                                                                                                                                                                            bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                                                                                                            bull Clouds services to support research provide considerable leverage for both

                                                                                                                                                                            individual researchers and entire communities of researchers

                                                                                                                                                                            • Day 2 - Azure as a PaaS
                                                                                                                                                                            • Day 2 - Applications

                                                                                                                                                                              bull Allocated a total of ~4000 instances bull 475 extra-large VMs (8 cores per VM) four datacenters US (2) Western and North Europe

                                                                                                                                                                              bull 8 deployments of AzureBLAST bull Each deployment has its own co-located storage service

                                                                                                                                                                              bull Divide 10 million sequences into multiple segments bull Each will be submitted to one deployment as one job for execution

                                                                                                                                                                              bull Each segment consists of smaller partitions

                                                                                                                                                                              bull When load imbalances redistribute the load manually

                                                                                                                                                                              5

                                                                                                                                                                              0

                                                                                                                                                                              62 6

                                                                                                                                                                              2 6

                                                                                                                                                                              2 6

                                                                                                                                                                              2 6

                                                                                                                                                                              2 5

                                                                                                                                                                              0 62

                                                                                                                                                                              bull Total size of the output result is ~230GB

                                                                                                                                                                              bull The number of total hits is 1764579487

                                                                                                                                                                              bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                                                                                                                              bull But based our estimates real working instance time should be 6~8 day

                                                                                                                                                                              bull Look into log data to analyze what took placehellip

                                                                                                                                                                              5

                                                                                                                                                                              0

                                                                                                                                                                              62 6

                                                                                                                                                                              2 6

                                                                                                                                                                              2 6

                                                                                                                                                                              2 6

                                                                                                                                                                              2 5

                                                                                                                                                                              0 62

                                                                                                                                                                              A normal log record should be

                                                                                                                                                                              Otherwise something is wrong (eg task failed to complete)

                                                                                                                                                                              3312010 614 RD00155D3611B0 Executing the task 251523

                                                                                                                                                                              3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                                                                                                                              3312010 625 RD00155D3611B0 Executing the task 251553

                                                                                                                                                                              3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                                                                                                                              3312010 644 RD00155D3611B0 Executing the task 251600

                                                                                                                                                                              3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                                                                                                                              3312010 822 RD00155D3611B0 Executing the task 251774

                                                                                                                                                                              3312010 950 RD00155D3611B0 Executing the task 251895

                                                                                                                                                                              3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                                                                                                                              North Europe Data Center totally 34256 tasks processed

                                                                                                                                                                              All 62 compute nodes lost tasks

                                                                                                                                                                              and then came back in a group

                                                                                                                                                                              This is an Update domain

                                                                                                                                                                              ~30 mins

                                                                                                                                                                              ~ 6 nodes in one group

                                                                                                                                                                              35 Nodes experience blob

                                                                                                                                                                              writing failure at same

                                                                                                                                                                              time

                                                                                                                                                                              West Europe Datacenter 30976 tasks are completed and job was killed

                                                                                                                                                                              A reasonable guess the

                                                                                                                                                                              Fault Domain is working

                                                                                                                                                                              MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                                                                                                                              You never miss the water till the well has run dry Irish Proverb

                                                                                                                                                                              ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                                                                                                                              Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                                                                                                                              λv = Latent heat of vaporization (Jg)

                                                                                                                                                                              Rn = Net radiation (W m-2)

                                                                                                                                                                              cp = Specific heat capacity of air (J kg-1 K-1)

                                                                                                                                                                              ρa = dry air density (kg m-3)

                                                                                                                                                                              δq = vapor pressure deficit (Pa)

                                                                                                                                                                              ga = Conductivity of air (inverse of ra) (m s-1)

                                                                                                                                                                              gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                                                                                                                              γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                                                                                                                              Estimating resistanceconductivity across a

                                                                                                                                                                              catchment can be tricky

                                                                                                                                                                              bull Lots of inputs big data reduction

                                                                                                                                                                              bull Some of the inputs are not so simple

                                                                                                                                                                              119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                                                                                                                              (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                                                                                                                              Penman-Monteith (1964)

                                                                                                                                                                              Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                                                                                                                              NASA MODIS

                                                                                                                                                                              imagery source

                                                                                                                                                                              archives

                                                                                                                                                                              5 TB (600K files)

                                                                                                                                                                              FLUXNET curated

                                                                                                                                                                              sensor dataset

                                                                                                                                                                              (30GB 960 files)

                                                                                                                                                                              FLUXNET curated

                                                                                                                                                                              field dataset

                                                                                                                                                                              2 KB (1 file)

                                                                                                                                                                              NCEPNCAR

                                                                                                                                                                              ~100MB

                                                                                                                                                                              (4K files)

                                                                                                                                                                              Vegetative

                                                                                                                                                                              clumping

                                                                                                                                                                              ~5MB (1file)

                                                                                                                                                                              Climate

                                                                                                                                                                              classification

                                                                                                                                                                              ~1MB (1file)

                                                                                                                                                                              20 US year = 1 global year

                                                                                                                                                                              Data collection (map) stage

                                                                                                                                                                              bull Downloads requested input tiles

                                                                                                                                                                              from NASA ftp sites

                                                                                                                                                                              bull Includes geospatial lookup for

                                                                                                                                                                              non-sinusoidal tiles that will

                                                                                                                                                                              contribute to a reprojected

                                                                                                                                                                              sinusoidal tile

                                                                                                                                                                              Reprojection (map) stage

                                                                                                                                                                              bull Converts source tile(s) to

                                                                                                                                                                              intermediate result sinusoidal tiles

                                                                                                                                                                              bull Simple nearest neighbor or spline

                                                                                                                                                                              algorithms

                                                                                                                                                                              Derivation reduction stage

                                                                                                                                                                              bull First stage visible to scientist

                                                                                                                                                                              bull Computes ET in our initial use

                                                                                                                                                                              Analysis reduction stage

                                                                                                                                                                              bull Optional second stage visible to

                                                                                                                                                                              scientist

                                                                                                                                                                              bull Enables production of science

                                                                                                                                                                              analysis artifacts such as maps

                                                                                                                                                                              tables virtual sensors

                                                                                                                                                                              Reduction 1

                                                                                                                                                                              Queue

                                                                                                                                                                              Source

                                                                                                                                                                              Metadata

                                                                                                                                                                              AzureMODIS

                                                                                                                                                                              Service Web Role Portal

                                                                                                                                                                              Request

                                                                                                                                                                              Queue

                                                                                                                                                                              Scientific

                                                                                                                                                                              Results

                                                                                                                                                                              Download

                                                                                                                                                                              Data Collection Stage

                                                                                                                                                                              Source Imagery Download Sites

                                                                                                                                                                              Reprojection

                                                                                                                                                                              Queue

                                                                                                                                                                              Reduction 2

                                                                                                                                                                              Queue

                                                                                                                                                                              Download

                                                                                                                                                                              Queue

                                                                                                                                                                              Scientists

                                                                                                                                                                              Science results

                                                                                                                                                                              Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                                                              httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                                                                                                              bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                                                                                                              bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                                                                                                              bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                                                                                                              recoverable units of work

                                                                                                                                                                              bull Execution status of all jobs and tasks persisted in Tables

                                                                                                                                                                              ltPipelineStagegt

                                                                                                                                                                              Request

                                                                                                                                                                              hellip ltPipelineStagegtJobStatus

                                                                                                                                                                              Persist ltPipelineStagegtJob Queue

                                                                                                                                                                              MODISAzure Service

                                                                                                                                                                              (Web Role)

                                                                                                                                                                              Service Monitor

                                                                                                                                                                              (Worker Role)

                                                                                                                                                                              Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                                                              hellip

                                                                                                                                                                              Dispatch

                                                                                                                                                                              ltPipelineStagegtTask Queue

                                                                                                                                                                              All work actually done by a Worker Role

                                                                                                                                                                              bull Sandboxes science or other executable

                                                                                                                                                                              bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                                                                                                              Service Monitor

                                                                                                                                                                              (Worker Role)

                                                                                                                                                                              Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                                                              GenericWorker

                                                                                                                                                                              (Worker Role)

                                                                                                                                                                              hellip

                                                                                                                                                                              hellip

                                                                                                                                                                              Dispatch

                                                                                                                                                                              ltPipelineStagegtTask Queue

                                                                                                                                                                              hellip

                                                                                                                                                                              ltInputgtData Storage

                                                                                                                                                                              bull Dequeues tasks created by the Service Monitor

                                                                                                                                                                              bull Retries failed tasks 3 times

                                                                                                                                                                              bull Maintains all task status

                                                                                                                                                                              Reprojection Request

                                                                                                                                                                              hellip

                                                                                                                                                                              Service Monitor

                                                                                                                                                                              (Worker Role)

                                                                                                                                                                              ReprojectionJobStatus Persist

                                                                                                                                                                              Parse amp Persist ReprojectionTaskStatus

                                                                                                                                                                              GenericWorker

                                                                                                                                                                              (Worker Role)

                                                                                                                                                                              hellip

                                                                                                                                                                              Job Queue

                                                                                                                                                                              hellip

                                                                                                                                                                              Dispatch

                                                                                                                                                                              Task Queue

                                                                                                                                                                              Points to

                                                                                                                                                                              hellip

                                                                                                                                                                              ScanTimeList

                                                                                                                                                                              SwathGranuleMeta

                                                                                                                                                                              Reprojection Data

                                                                                                                                                                              Storage

                                                                                                                                                                              Each entity specifies a

                                                                                                                                                                              single reprojection job

                                                                                                                                                                              request

                                                                                                                                                                              Each entity specifies a

                                                                                                                                                                              single reprojection task (ie

                                                                                                                                                                              a single tile)

                                                                                                                                                                              Query this table to get

                                                                                                                                                                              geo-metadata (eg

                                                                                                                                                                              boundaries) for each swath

                                                                                                                                                                              tile

                                                                                                                                                                              Query this table to get the

                                                                                                                                                                              list of satellite scan times

                                                                                                                                                                              that cover a target tile

                                                                                                                                                                              Swath Source

                                                                                                                                                                              Data Storage

                                                                                                                                                                              bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                                                                                                              bull Storage costs driven by data scale and 6 month project duration

                                                                                                                                                                              bull Small with respect to the people costs even at graduate student rates

                                                                                                                                                                              Reduction 1 Queue

                                                                                                                                                                              Source Metadata

                                                                                                                                                                              Request Queue

                                                                                                                                                                              Scientific Results Download

                                                                                                                                                                              Data Collection Stage

                                                                                                                                                                              Source Imagery Download Sites

                                                                                                                                                                              Reprojection Queue

                                                                                                                                                                              Reduction 2 Queue

                                                                                                                                                                              Download

                                                                                                                                                                              Queue

                                                                                                                                                                              Scientists

                                                                                                                                                                              Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                                                              400-500 GB

                                                                                                                                                                              60K files

                                                                                                                                                                              10 MBsec

                                                                                                                                                                              11 hours

                                                                                                                                                                              lt10 workers

                                                                                                                                                                              $50 upload

                                                                                                                                                                              $450 storage

                                                                                                                                                                              400 GB

                                                                                                                                                                              45K files

                                                                                                                                                                              3500 hours

                                                                                                                                                                              20-100

                                                                                                                                                                              workers

                                                                                                                                                                              5-7 GB

                                                                                                                                                                              55K files

                                                                                                                                                                              1800 hours

                                                                                                                                                                              20-100

                                                                                                                                                                              workers

                                                                                                                                                                              lt10 GB

                                                                                                                                                                              ~1K files

                                                                                                                                                                              1800 hours

                                                                                                                                                                              20-100

                                                                                                                                                                              workers

                                                                                                                                                                              $420 cpu

                                                                                                                                                                              $60 download

                                                                                                                                                                              $216 cpu

                                                                                                                                                                              $1 download

                                                                                                                                                                              $6 storage

                                                                                                                                                                              $216 cpu

                                                                                                                                                                              $2 download

                                                                                                                                                                              $9 storage

                                                                                                                                                                              AzureMODIS

                                                                                                                                                                              Service Web Role Portal

                                                                                                                                                                              Total $1420

                                                                                                                                                                              bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                                                                                                              potential to be important to both large and small scale science problems

                                                                                                                                                                              bull Equally import they can increase participation in research providing needed

                                                                                                                                                                              resources to userscommunities without ready access

                                                                                                                                                                              bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                                                                                                              support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                                                                                                              latency applications do not perform optimally on clouds today

                                                                                                                                                                              bull Provide valuable fault tolerance and scalability abstractions

                                                                                                                                                                              bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                                                                                                              bull Clouds services to support research provide considerable leverage for both

                                                                                                                                                                              individual researchers and entire communities of researchers

                                                                                                                                                                              • Day 2 - Azure as a PaaS
                                                                                                                                                                              • Day 2 - Applications

                                                                                                                                                                                bull Total size of the output result is ~230GB

                                                                                                                                                                                bull The number of total hits is 1764579487

                                                                                                                                                                                bull Started at March 25th the last task completed on April 8th (10 days compute)

                                                                                                                                                                                bull But based our estimates real working instance time should be 6~8 day

                                                                                                                                                                                bull Look into log data to analyze what took placehellip

                                                                                                                                                                                5

                                                                                                                                                                                0

                                                                                                                                                                                62 6

                                                                                                                                                                                2 6

                                                                                                                                                                                2 6

                                                                                                                                                                                2 6

                                                                                                                                                                                2 5

                                                                                                                                                                                0 62

                                                                                                                                                                                A normal log record should be

                                                                                                                                                                                Otherwise something is wrong (eg task failed to complete)

                                                                                                                                                                                3312010 614 RD00155D3611B0 Executing the task 251523

                                                                                                                                                                                3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                                                                                                                                3312010 625 RD00155D3611B0 Executing the task 251553

                                                                                                                                                                                3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                                                                                                                                3312010 644 RD00155D3611B0 Executing the task 251600

                                                                                                                                                                                3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                                                                                                                                3312010 822 RD00155D3611B0 Executing the task 251774

                                                                                                                                                                                3312010 950 RD00155D3611B0 Executing the task 251895

                                                                                                                                                                                3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                                                                                                                                North Europe Data Center totally 34256 tasks processed

                                                                                                                                                                                All 62 compute nodes lost tasks

                                                                                                                                                                                and then came back in a group

                                                                                                                                                                                This is an Update domain

                                                                                                                                                                                ~30 mins

                                                                                                                                                                                ~ 6 nodes in one group

                                                                                                                                                                                35 Nodes experience blob

                                                                                                                                                                                writing failure at same

                                                                                                                                                                                time

                                                                                                                                                                                West Europe Datacenter 30976 tasks are completed and job was killed

                                                                                                                                                                                A reasonable guess the

                                                                                                                                                                                Fault Domain is working

                                                                                                                                                                                MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                                                                                                                                You never miss the water till the well has run dry Irish Proverb

                                                                                                                                                                                ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                                                                                                                                Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                                                                                                                                λv = Latent heat of vaporization (Jg)

                                                                                                                                                                                Rn = Net radiation (W m-2)

                                                                                                                                                                                cp = Specific heat capacity of air (J kg-1 K-1)

                                                                                                                                                                                ρa = dry air density (kg m-3)

                                                                                                                                                                                δq = vapor pressure deficit (Pa)

                                                                                                                                                                                ga = Conductivity of air (inverse of ra) (m s-1)

                                                                                                                                                                                gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                                                                                                                                γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                                                                                                                                Estimating resistanceconductivity across a

                                                                                                                                                                                catchment can be tricky

                                                                                                                                                                                bull Lots of inputs big data reduction

                                                                                                                                                                                bull Some of the inputs are not so simple

                                                                                                                                                                                119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                                                                                                                                (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                                                                                                                                Penman-Monteith (1964)

                                                                                                                                                                                Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                                                                                                                                NASA MODIS

                                                                                                                                                                                imagery source

                                                                                                                                                                                archives

                                                                                                                                                                                5 TB (600K files)

                                                                                                                                                                                FLUXNET curated

                                                                                                                                                                                sensor dataset

                                                                                                                                                                                (30GB 960 files)

                                                                                                                                                                                FLUXNET curated

                                                                                                                                                                                field dataset

                                                                                                                                                                                2 KB (1 file)

                                                                                                                                                                                NCEPNCAR

                                                                                                                                                                                ~100MB

                                                                                                                                                                                (4K files)

                                                                                                                                                                                Vegetative

                                                                                                                                                                                clumping

                                                                                                                                                                                ~5MB (1file)

                                                                                                                                                                                Climate

                                                                                                                                                                                classification

                                                                                                                                                                                ~1MB (1file)

                                                                                                                                                                                20 US year = 1 global year

                                                                                                                                                                                Data collection (map) stage

                                                                                                                                                                                bull Downloads requested input tiles

                                                                                                                                                                                from NASA ftp sites

                                                                                                                                                                                bull Includes geospatial lookup for

                                                                                                                                                                                non-sinusoidal tiles that will

                                                                                                                                                                                contribute to a reprojected

                                                                                                                                                                                sinusoidal tile

                                                                                                                                                                                Reprojection (map) stage

                                                                                                                                                                                bull Converts source tile(s) to

                                                                                                                                                                                intermediate result sinusoidal tiles

                                                                                                                                                                                bull Simple nearest neighbor or spline

                                                                                                                                                                                algorithms

                                                                                                                                                                                Derivation reduction stage

                                                                                                                                                                                bull First stage visible to scientist

                                                                                                                                                                                bull Computes ET in our initial use

                                                                                                                                                                                Analysis reduction stage

                                                                                                                                                                                bull Optional second stage visible to

                                                                                                                                                                                scientist

                                                                                                                                                                                bull Enables production of science

                                                                                                                                                                                analysis artifacts such as maps

                                                                                                                                                                                tables virtual sensors

                                                                                                                                                                                Reduction 1

                                                                                                                                                                                Queue

                                                                                                                                                                                Source

                                                                                                                                                                                Metadata

                                                                                                                                                                                AzureMODIS

                                                                                                                                                                                Service Web Role Portal

                                                                                                                                                                                Request

                                                                                                                                                                                Queue

                                                                                                                                                                                Scientific

                                                                                                                                                                                Results

                                                                                                                                                                                Download

                                                                                                                                                                                Data Collection Stage

                                                                                                                                                                                Source Imagery Download Sites

                                                                                                                                                                                Reprojection

                                                                                                                                                                                Queue

                                                                                                                                                                                Reduction 2

                                                                                                                                                                                Queue

                                                                                                                                                                                Download

                                                                                                                                                                                Queue

                                                                                                                                                                                Scientists

                                                                                                                                                                                Science results

                                                                                                                                                                                Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                                                                httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                                                                                                                bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                                                                                                                bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                                                                                                                bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                                                                                                                recoverable units of work

                                                                                                                                                                                bull Execution status of all jobs and tasks persisted in Tables

                                                                                                                                                                                ltPipelineStagegt

                                                                                                                                                                                Request

                                                                                                                                                                                hellip ltPipelineStagegtJobStatus

                                                                                                                                                                                Persist ltPipelineStagegtJob Queue

                                                                                                                                                                                MODISAzure Service

                                                                                                                                                                                (Web Role)

                                                                                                                                                                                Service Monitor

                                                                                                                                                                                (Worker Role)

                                                                                                                                                                                Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                                                                hellip

                                                                                                                                                                                Dispatch

                                                                                                                                                                                ltPipelineStagegtTask Queue

                                                                                                                                                                                All work actually done by a Worker Role

                                                                                                                                                                                bull Sandboxes science or other executable

                                                                                                                                                                                bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                                                                                                                Service Monitor

                                                                                                                                                                                (Worker Role)

                                                                                                                                                                                Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                                                                GenericWorker

                                                                                                                                                                                (Worker Role)

                                                                                                                                                                                hellip

                                                                                                                                                                                hellip

                                                                                                                                                                                Dispatch

                                                                                                                                                                                ltPipelineStagegtTask Queue

                                                                                                                                                                                hellip

                                                                                                                                                                                ltInputgtData Storage

                                                                                                                                                                                bull Dequeues tasks created by the Service Monitor

                                                                                                                                                                                bull Retries failed tasks 3 times

                                                                                                                                                                                bull Maintains all task status

                                                                                                                                                                                Reprojection Request

                                                                                                                                                                                hellip

                                                                                                                                                                                Service Monitor

                                                                                                                                                                                (Worker Role)

                                                                                                                                                                                ReprojectionJobStatus Persist

                                                                                                                                                                                Parse amp Persist ReprojectionTaskStatus

                                                                                                                                                                                GenericWorker

                                                                                                                                                                                (Worker Role)

                                                                                                                                                                                hellip

                                                                                                                                                                                Job Queue

                                                                                                                                                                                hellip

                                                                                                                                                                                Dispatch

                                                                                                                                                                                Task Queue

                                                                                                                                                                                Points to

                                                                                                                                                                                hellip

                                                                                                                                                                                ScanTimeList

                                                                                                                                                                                SwathGranuleMeta

                                                                                                                                                                                Reprojection Data

                                                                                                                                                                                Storage

                                                                                                                                                                                Each entity specifies a

                                                                                                                                                                                single reprojection job

                                                                                                                                                                                request

                                                                                                                                                                                Each entity specifies a

                                                                                                                                                                                single reprojection task (ie

                                                                                                                                                                                a single tile)

                                                                                                                                                                                Query this table to get

                                                                                                                                                                                geo-metadata (eg

                                                                                                                                                                                boundaries) for each swath

                                                                                                                                                                                tile

                                                                                                                                                                                Query this table to get the

                                                                                                                                                                                list of satellite scan times

                                                                                                                                                                                that cover a target tile

                                                                                                                                                                                Swath Source

                                                                                                                                                                                Data Storage

                                                                                                                                                                                bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                                                                                                                bull Storage costs driven by data scale and 6 month project duration

                                                                                                                                                                                bull Small with respect to the people costs even at graduate student rates

                                                                                                                                                                                Reduction 1 Queue

                                                                                                                                                                                Source Metadata

                                                                                                                                                                                Request Queue

                                                                                                                                                                                Scientific Results Download

                                                                                                                                                                                Data Collection Stage

                                                                                                                                                                                Source Imagery Download Sites

                                                                                                                                                                                Reprojection Queue

                                                                                                                                                                                Reduction 2 Queue

                                                                                                                                                                                Download

                                                                                                                                                                                Queue

                                                                                                                                                                                Scientists

                                                                                                                                                                                Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                                                                400-500 GB

                                                                                                                                                                                60K files

                                                                                                                                                                                10 MBsec

                                                                                                                                                                                11 hours

                                                                                                                                                                                lt10 workers

                                                                                                                                                                                $50 upload

                                                                                                                                                                                $450 storage

                                                                                                                                                                                400 GB

                                                                                                                                                                                45K files

                                                                                                                                                                                3500 hours

                                                                                                                                                                                20-100

                                                                                                                                                                                workers

                                                                                                                                                                                5-7 GB

                                                                                                                                                                                55K files

                                                                                                                                                                                1800 hours

                                                                                                                                                                                20-100

                                                                                                                                                                                workers

                                                                                                                                                                                lt10 GB

                                                                                                                                                                                ~1K files

                                                                                                                                                                                1800 hours

                                                                                                                                                                                20-100

                                                                                                                                                                                workers

                                                                                                                                                                                $420 cpu

                                                                                                                                                                                $60 download

                                                                                                                                                                                $216 cpu

                                                                                                                                                                                $1 download

                                                                                                                                                                                $6 storage

                                                                                                                                                                                $216 cpu

                                                                                                                                                                                $2 download

                                                                                                                                                                                $9 storage

                                                                                                                                                                                AzureMODIS

                                                                                                                                                                                Service Web Role Portal

                                                                                                                                                                                Total $1420

                                                                                                                                                                                bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                                                                                                                potential to be important to both large and small scale science problems

                                                                                                                                                                                bull Equally import they can increase participation in research providing needed

                                                                                                                                                                                resources to userscommunities without ready access

                                                                                                                                                                                bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                                                                                                                support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                                                                                                                latency applications do not perform optimally on clouds today

                                                                                                                                                                                bull Provide valuable fault tolerance and scalability abstractions

                                                                                                                                                                                bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                                                                                                                bull Clouds services to support research provide considerable leverage for both

                                                                                                                                                                                individual researchers and entire communities of researchers

                                                                                                                                                                                • Day 2 - Azure as a PaaS
                                                                                                                                                                                • Day 2 - Applications

                                                                                                                                                                                  A normal log record should be

                                                                                                                                                                                  Otherwise something is wrong (eg task failed to complete)

                                                                                                                                                                                  3312010 614 RD00155D3611B0 Executing the task 251523

                                                                                                                                                                                  3312010 625 RD00155D3611B0 Execution of task 251523 is done it took 109mins

                                                                                                                                                                                  3312010 625 RD00155D3611B0 Executing the task 251553

                                                                                                                                                                                  3312010 644 RD00155D3611B0 Execution of task 251553 is done it took 193mins

                                                                                                                                                                                  3312010 644 RD00155D3611B0 Executing the task 251600

                                                                                                                                                                                  3312010 702 RD00155D3611B0 Execution of task 251600 is done it took 1727 mins

                                                                                                                                                                                  3312010 822 RD00155D3611B0 Executing the task 251774

                                                                                                                                                                                  3312010 950 RD00155D3611B0 Executing the task 251895

                                                                                                                                                                                  3312010 1112 RD00155D3611B0 Execution of task 251895 is done it took 82 mins

                                                                                                                                                                                  North Europe Data Center totally 34256 tasks processed

                                                                                                                                                                                  All 62 compute nodes lost tasks

                                                                                                                                                                                  and then came back in a group

                                                                                                                                                                                  This is an Update domain

                                                                                                                                                                                  ~30 mins

                                                                                                                                                                                  ~ 6 nodes in one group

                                                                                                                                                                                  35 Nodes experience blob

                                                                                                                                                                                  writing failure at same

                                                                                                                                                                                  time

                                                                                                                                                                                  West Europe Datacenter 30976 tasks are completed and job was killed

                                                                                                                                                                                  A reasonable guess the

                                                                                                                                                                                  Fault Domain is working

                                                                                                                                                                                  MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                                                                                                                                  You never miss the water till the well has run dry Irish Proverb

                                                                                                                                                                                  ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                                                                                                                                  Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                                                                                                                                  λv = Latent heat of vaporization (Jg)

                                                                                                                                                                                  Rn = Net radiation (W m-2)

                                                                                                                                                                                  cp = Specific heat capacity of air (J kg-1 K-1)

                                                                                                                                                                                  ρa = dry air density (kg m-3)

                                                                                                                                                                                  δq = vapor pressure deficit (Pa)

                                                                                                                                                                                  ga = Conductivity of air (inverse of ra) (m s-1)

                                                                                                                                                                                  gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                                                                                                                                  γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                                                                                                                                  Estimating resistanceconductivity across a

                                                                                                                                                                                  catchment can be tricky

                                                                                                                                                                                  bull Lots of inputs big data reduction

                                                                                                                                                                                  bull Some of the inputs are not so simple

                                                                                                                                                                                  119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                                                                                                                                  (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                                                                                                                                  Penman-Monteith (1964)

                                                                                                                                                                                  Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                                                                                                                                  NASA MODIS

                                                                                                                                                                                  imagery source

                                                                                                                                                                                  archives

                                                                                                                                                                                  5 TB (600K files)

                                                                                                                                                                                  FLUXNET curated

                                                                                                                                                                                  sensor dataset

                                                                                                                                                                                  (30GB 960 files)

                                                                                                                                                                                  FLUXNET curated

                                                                                                                                                                                  field dataset

                                                                                                                                                                                  2 KB (1 file)

                                                                                                                                                                                  NCEPNCAR

                                                                                                                                                                                  ~100MB

                                                                                                                                                                                  (4K files)

                                                                                                                                                                                  Vegetative

                                                                                                                                                                                  clumping

                                                                                                                                                                                  ~5MB (1file)

                                                                                                                                                                                  Climate

                                                                                                                                                                                  classification

                                                                                                                                                                                  ~1MB (1file)

                                                                                                                                                                                  20 US year = 1 global year

                                                                                                                                                                                  Data collection (map) stage

                                                                                                                                                                                  bull Downloads requested input tiles

                                                                                                                                                                                  from NASA ftp sites

                                                                                                                                                                                  bull Includes geospatial lookup for

                                                                                                                                                                                  non-sinusoidal tiles that will

                                                                                                                                                                                  contribute to a reprojected

                                                                                                                                                                                  sinusoidal tile

                                                                                                                                                                                  Reprojection (map) stage

                                                                                                                                                                                  bull Converts source tile(s) to

                                                                                                                                                                                  intermediate result sinusoidal tiles

                                                                                                                                                                                  bull Simple nearest neighbor or spline

                                                                                                                                                                                  algorithms

                                                                                                                                                                                  Derivation reduction stage

                                                                                                                                                                                  bull First stage visible to scientist

                                                                                                                                                                                  bull Computes ET in our initial use

                                                                                                                                                                                  Analysis reduction stage

                                                                                                                                                                                  bull Optional second stage visible to

                                                                                                                                                                                  scientist

                                                                                                                                                                                  bull Enables production of science

                                                                                                                                                                                  analysis artifacts such as maps

                                                                                                                                                                                  tables virtual sensors

                                                                                                                                                                                  Reduction 1

                                                                                                                                                                                  Queue

                                                                                                                                                                                  Source

                                                                                                                                                                                  Metadata

                                                                                                                                                                                  AzureMODIS

                                                                                                                                                                                  Service Web Role Portal

                                                                                                                                                                                  Request

                                                                                                                                                                                  Queue

                                                                                                                                                                                  Scientific

                                                                                                                                                                                  Results

                                                                                                                                                                                  Download

                                                                                                                                                                                  Data Collection Stage

                                                                                                                                                                                  Source Imagery Download Sites

                                                                                                                                                                                  Reprojection

                                                                                                                                                                                  Queue

                                                                                                                                                                                  Reduction 2

                                                                                                                                                                                  Queue

                                                                                                                                                                                  Download

                                                                                                                                                                                  Queue

                                                                                                                                                                                  Scientists

                                                                                                                                                                                  Science results

                                                                                                                                                                                  Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                                                                  httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                                                                                                                  bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                                                                                                                  bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                                                                                                                  bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                                                                                                                  recoverable units of work

                                                                                                                                                                                  bull Execution status of all jobs and tasks persisted in Tables

                                                                                                                                                                                  ltPipelineStagegt

                                                                                                                                                                                  Request

                                                                                                                                                                                  hellip ltPipelineStagegtJobStatus

                                                                                                                                                                                  Persist ltPipelineStagegtJob Queue

                                                                                                                                                                                  MODISAzure Service

                                                                                                                                                                                  (Web Role)

                                                                                                                                                                                  Service Monitor

                                                                                                                                                                                  (Worker Role)

                                                                                                                                                                                  Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                                                                  hellip

                                                                                                                                                                                  Dispatch

                                                                                                                                                                                  ltPipelineStagegtTask Queue

                                                                                                                                                                                  All work actually done by a Worker Role

                                                                                                                                                                                  bull Sandboxes science or other executable

                                                                                                                                                                                  bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                                                                                                                  Service Monitor

                                                                                                                                                                                  (Worker Role)

                                                                                                                                                                                  Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                                                                  GenericWorker

                                                                                                                                                                                  (Worker Role)

                                                                                                                                                                                  hellip

                                                                                                                                                                                  hellip

                                                                                                                                                                                  Dispatch

                                                                                                                                                                                  ltPipelineStagegtTask Queue

                                                                                                                                                                                  hellip

                                                                                                                                                                                  ltInputgtData Storage

                                                                                                                                                                                  bull Dequeues tasks created by the Service Monitor

                                                                                                                                                                                  bull Retries failed tasks 3 times

                                                                                                                                                                                  bull Maintains all task status

                                                                                                                                                                                  Reprojection Request

                                                                                                                                                                                  hellip

                                                                                                                                                                                  Service Monitor

                                                                                                                                                                                  (Worker Role)

                                                                                                                                                                                  ReprojectionJobStatus Persist

                                                                                                                                                                                  Parse amp Persist ReprojectionTaskStatus

                                                                                                                                                                                  GenericWorker

                                                                                                                                                                                  (Worker Role)

                                                                                                                                                                                  hellip

                                                                                                                                                                                  Job Queue

                                                                                                                                                                                  hellip

                                                                                                                                                                                  Dispatch

                                                                                                                                                                                  Task Queue

                                                                                                                                                                                  Points to

                                                                                                                                                                                  hellip

                                                                                                                                                                                  ScanTimeList

                                                                                                                                                                                  SwathGranuleMeta

                                                                                                                                                                                  Reprojection Data

                                                                                                                                                                                  Storage

                                                                                                                                                                                  Each entity specifies a

                                                                                                                                                                                  single reprojection job

                                                                                                                                                                                  request

                                                                                                                                                                                  Each entity specifies a

                                                                                                                                                                                  single reprojection task (ie

                                                                                                                                                                                  a single tile)

                                                                                                                                                                                  Query this table to get

                                                                                                                                                                                  geo-metadata (eg

                                                                                                                                                                                  boundaries) for each swath

                                                                                                                                                                                  tile

                                                                                                                                                                                  Query this table to get the

                                                                                                                                                                                  list of satellite scan times

                                                                                                                                                                                  that cover a target tile

                                                                                                                                                                                  Swath Source

                                                                                                                                                                                  Data Storage

                                                                                                                                                                                  bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                                                                                                                  bull Storage costs driven by data scale and 6 month project duration

                                                                                                                                                                                  bull Small with respect to the people costs even at graduate student rates

                                                                                                                                                                                  Reduction 1 Queue

                                                                                                                                                                                  Source Metadata

                                                                                                                                                                                  Request Queue

                                                                                                                                                                                  Scientific Results Download

                                                                                                                                                                                  Data Collection Stage

                                                                                                                                                                                  Source Imagery Download Sites

                                                                                                                                                                                  Reprojection Queue

                                                                                                                                                                                  Reduction 2 Queue

                                                                                                                                                                                  Download

                                                                                                                                                                                  Queue

                                                                                                                                                                                  Scientists

                                                                                                                                                                                  Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                                                                  400-500 GB

                                                                                                                                                                                  60K files

                                                                                                                                                                                  10 MBsec

                                                                                                                                                                                  11 hours

                                                                                                                                                                                  lt10 workers

                                                                                                                                                                                  $50 upload

                                                                                                                                                                                  $450 storage

                                                                                                                                                                                  400 GB

                                                                                                                                                                                  45K files

                                                                                                                                                                                  3500 hours

                                                                                                                                                                                  20-100

                                                                                                                                                                                  workers

                                                                                                                                                                                  5-7 GB

                                                                                                                                                                                  55K files

                                                                                                                                                                                  1800 hours

                                                                                                                                                                                  20-100

                                                                                                                                                                                  workers

                                                                                                                                                                                  lt10 GB

                                                                                                                                                                                  ~1K files

                                                                                                                                                                                  1800 hours

                                                                                                                                                                                  20-100

                                                                                                                                                                                  workers

                                                                                                                                                                                  $420 cpu

                                                                                                                                                                                  $60 download

                                                                                                                                                                                  $216 cpu

                                                                                                                                                                                  $1 download

                                                                                                                                                                                  $6 storage

                                                                                                                                                                                  $216 cpu

                                                                                                                                                                                  $2 download

                                                                                                                                                                                  $9 storage

                                                                                                                                                                                  AzureMODIS

                                                                                                                                                                                  Service Web Role Portal

                                                                                                                                                                                  Total $1420

                                                                                                                                                                                  bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                                                                                                                  potential to be important to both large and small scale science problems

                                                                                                                                                                                  bull Equally import they can increase participation in research providing needed

                                                                                                                                                                                  resources to userscommunities without ready access

                                                                                                                                                                                  bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                                                                                                                  support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                                                                                                                  latency applications do not perform optimally on clouds today

                                                                                                                                                                                  bull Provide valuable fault tolerance and scalability abstractions

                                                                                                                                                                                  bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                                                                                                                  bull Clouds services to support research provide considerable leverage for both

                                                                                                                                                                                  individual researchers and entire communities of researchers

                                                                                                                                                                                  • Day 2 - Azure as a PaaS
                                                                                                                                                                                  • Day 2 - Applications

                                                                                                                                                                                    North Europe Data Center totally 34256 tasks processed

                                                                                                                                                                                    All 62 compute nodes lost tasks

                                                                                                                                                                                    and then came back in a group

                                                                                                                                                                                    This is an Update domain

                                                                                                                                                                                    ~30 mins

                                                                                                                                                                                    ~ 6 nodes in one group

                                                                                                                                                                                    35 Nodes experience blob

                                                                                                                                                                                    writing failure at same

                                                                                                                                                                                    time

                                                                                                                                                                                    West Europe Datacenter 30976 tasks are completed and job was killed

                                                                                                                                                                                    A reasonable guess the

                                                                                                                                                                                    Fault Domain is working

                                                                                                                                                                                    MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                                                                                                                                    You never miss the water till the well has run dry Irish Proverb

                                                                                                                                                                                    ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                                                                                                                                    Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                                                                                                                                    λv = Latent heat of vaporization (Jg)

                                                                                                                                                                                    Rn = Net radiation (W m-2)

                                                                                                                                                                                    cp = Specific heat capacity of air (J kg-1 K-1)

                                                                                                                                                                                    ρa = dry air density (kg m-3)

                                                                                                                                                                                    δq = vapor pressure deficit (Pa)

                                                                                                                                                                                    ga = Conductivity of air (inverse of ra) (m s-1)

                                                                                                                                                                                    gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                                                                                                                                    γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                                                                                                                                    Estimating resistanceconductivity across a

                                                                                                                                                                                    catchment can be tricky

                                                                                                                                                                                    bull Lots of inputs big data reduction

                                                                                                                                                                                    bull Some of the inputs are not so simple

                                                                                                                                                                                    119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                                                                                                                                    (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                                                                                                                                    Penman-Monteith (1964)

                                                                                                                                                                                    Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                                                                                                                                    NASA MODIS

                                                                                                                                                                                    imagery source

                                                                                                                                                                                    archives

                                                                                                                                                                                    5 TB (600K files)

                                                                                                                                                                                    FLUXNET curated

                                                                                                                                                                                    sensor dataset

                                                                                                                                                                                    (30GB 960 files)

                                                                                                                                                                                    FLUXNET curated

                                                                                                                                                                                    field dataset

                                                                                                                                                                                    2 KB (1 file)

                                                                                                                                                                                    NCEPNCAR

                                                                                                                                                                                    ~100MB

                                                                                                                                                                                    (4K files)

                                                                                                                                                                                    Vegetative

                                                                                                                                                                                    clumping

                                                                                                                                                                                    ~5MB (1file)

                                                                                                                                                                                    Climate

                                                                                                                                                                                    classification

                                                                                                                                                                                    ~1MB (1file)

                                                                                                                                                                                    20 US year = 1 global year

                                                                                                                                                                                    Data collection (map) stage

                                                                                                                                                                                    bull Downloads requested input tiles

                                                                                                                                                                                    from NASA ftp sites

                                                                                                                                                                                    bull Includes geospatial lookup for

                                                                                                                                                                                    non-sinusoidal tiles that will

                                                                                                                                                                                    contribute to a reprojected

                                                                                                                                                                                    sinusoidal tile

                                                                                                                                                                                    Reprojection (map) stage

                                                                                                                                                                                    bull Converts source tile(s) to

                                                                                                                                                                                    intermediate result sinusoidal tiles

                                                                                                                                                                                    bull Simple nearest neighbor or spline

                                                                                                                                                                                    algorithms

                                                                                                                                                                                    Derivation reduction stage

                                                                                                                                                                                    bull First stage visible to scientist

                                                                                                                                                                                    bull Computes ET in our initial use

                                                                                                                                                                                    Analysis reduction stage

                                                                                                                                                                                    bull Optional second stage visible to

                                                                                                                                                                                    scientist

                                                                                                                                                                                    bull Enables production of science

                                                                                                                                                                                    analysis artifacts such as maps

                                                                                                                                                                                    tables virtual sensors

                                                                                                                                                                                    Reduction 1

                                                                                                                                                                                    Queue

                                                                                                                                                                                    Source

                                                                                                                                                                                    Metadata

                                                                                                                                                                                    AzureMODIS

                                                                                                                                                                                    Service Web Role Portal

                                                                                                                                                                                    Request

                                                                                                                                                                                    Queue

                                                                                                                                                                                    Scientific

                                                                                                                                                                                    Results

                                                                                                                                                                                    Download

                                                                                                                                                                                    Data Collection Stage

                                                                                                                                                                                    Source Imagery Download Sites

                                                                                                                                                                                    Reprojection

                                                                                                                                                                                    Queue

                                                                                                                                                                                    Reduction 2

                                                                                                                                                                                    Queue

                                                                                                                                                                                    Download

                                                                                                                                                                                    Queue

                                                                                                                                                                                    Scientists

                                                                                                                                                                                    Science results

                                                                                                                                                                                    Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                                                                    httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                                                                                                                    bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                                                                                                                    bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                                                                                                                    bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                                                                                                                    recoverable units of work

                                                                                                                                                                                    bull Execution status of all jobs and tasks persisted in Tables

                                                                                                                                                                                    ltPipelineStagegt

                                                                                                                                                                                    Request

                                                                                                                                                                                    hellip ltPipelineStagegtJobStatus

                                                                                                                                                                                    Persist ltPipelineStagegtJob Queue

                                                                                                                                                                                    MODISAzure Service

                                                                                                                                                                                    (Web Role)

                                                                                                                                                                                    Service Monitor

                                                                                                                                                                                    (Worker Role)

                                                                                                                                                                                    Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                                                                    hellip

                                                                                                                                                                                    Dispatch

                                                                                                                                                                                    ltPipelineStagegtTask Queue

                                                                                                                                                                                    All work actually done by a Worker Role

                                                                                                                                                                                    bull Sandboxes science or other executable

                                                                                                                                                                                    bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                                                                                                                    Service Monitor

                                                                                                                                                                                    (Worker Role)

                                                                                                                                                                                    Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                                                                    GenericWorker

                                                                                                                                                                                    (Worker Role)

                                                                                                                                                                                    hellip

                                                                                                                                                                                    hellip

                                                                                                                                                                                    Dispatch

                                                                                                                                                                                    ltPipelineStagegtTask Queue

                                                                                                                                                                                    hellip

                                                                                                                                                                                    ltInputgtData Storage

                                                                                                                                                                                    bull Dequeues tasks created by the Service Monitor

                                                                                                                                                                                    bull Retries failed tasks 3 times

                                                                                                                                                                                    bull Maintains all task status

                                                                                                                                                                                    Reprojection Request

                                                                                                                                                                                    hellip

                                                                                                                                                                                    Service Monitor

                                                                                                                                                                                    (Worker Role)

                                                                                                                                                                                    ReprojectionJobStatus Persist

                                                                                                                                                                                    Parse amp Persist ReprojectionTaskStatus

                                                                                                                                                                                    GenericWorker

                                                                                                                                                                                    (Worker Role)

                                                                                                                                                                                    hellip

                                                                                                                                                                                    Job Queue

                                                                                                                                                                                    hellip

                                                                                                                                                                                    Dispatch

                                                                                                                                                                                    Task Queue

                                                                                                                                                                                    Points to

                                                                                                                                                                                    hellip

                                                                                                                                                                                    ScanTimeList

                                                                                                                                                                                    SwathGranuleMeta

                                                                                                                                                                                    Reprojection Data

                                                                                                                                                                                    Storage

                                                                                                                                                                                    Each entity specifies a

                                                                                                                                                                                    single reprojection job

                                                                                                                                                                                    request

                                                                                                                                                                                    Each entity specifies a

                                                                                                                                                                                    single reprojection task (ie

                                                                                                                                                                                    a single tile)

                                                                                                                                                                                    Query this table to get

                                                                                                                                                                                    geo-metadata (eg

                                                                                                                                                                                    boundaries) for each swath

                                                                                                                                                                                    tile

                                                                                                                                                                                    Query this table to get the

                                                                                                                                                                                    list of satellite scan times

                                                                                                                                                                                    that cover a target tile

                                                                                                                                                                                    Swath Source

                                                                                                                                                                                    Data Storage

                                                                                                                                                                                    bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                                                                                                                    bull Storage costs driven by data scale and 6 month project duration

                                                                                                                                                                                    bull Small with respect to the people costs even at graduate student rates

                                                                                                                                                                                    Reduction 1 Queue

                                                                                                                                                                                    Source Metadata

                                                                                                                                                                                    Request Queue

                                                                                                                                                                                    Scientific Results Download

                                                                                                                                                                                    Data Collection Stage

                                                                                                                                                                                    Source Imagery Download Sites

                                                                                                                                                                                    Reprojection Queue

                                                                                                                                                                                    Reduction 2 Queue

                                                                                                                                                                                    Download

                                                                                                                                                                                    Queue

                                                                                                                                                                                    Scientists

                                                                                                                                                                                    Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                                                                    400-500 GB

                                                                                                                                                                                    60K files

                                                                                                                                                                                    10 MBsec

                                                                                                                                                                                    11 hours

                                                                                                                                                                                    lt10 workers

                                                                                                                                                                                    $50 upload

                                                                                                                                                                                    $450 storage

                                                                                                                                                                                    400 GB

                                                                                                                                                                                    45K files

                                                                                                                                                                                    3500 hours

                                                                                                                                                                                    20-100

                                                                                                                                                                                    workers

                                                                                                                                                                                    5-7 GB

                                                                                                                                                                                    55K files

                                                                                                                                                                                    1800 hours

                                                                                                                                                                                    20-100

                                                                                                                                                                                    workers

                                                                                                                                                                                    lt10 GB

                                                                                                                                                                                    ~1K files

                                                                                                                                                                                    1800 hours

                                                                                                                                                                                    20-100

                                                                                                                                                                                    workers

                                                                                                                                                                                    $420 cpu

                                                                                                                                                                                    $60 download

                                                                                                                                                                                    $216 cpu

                                                                                                                                                                                    $1 download

                                                                                                                                                                                    $6 storage

                                                                                                                                                                                    $216 cpu

                                                                                                                                                                                    $2 download

                                                                                                                                                                                    $9 storage

                                                                                                                                                                                    AzureMODIS

                                                                                                                                                                                    Service Web Role Portal

                                                                                                                                                                                    Total $1420

                                                                                                                                                                                    bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                                                                                                                    potential to be important to both large and small scale science problems

                                                                                                                                                                                    bull Equally import they can increase participation in research providing needed

                                                                                                                                                                                    resources to userscommunities without ready access

                                                                                                                                                                                    bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                                                                                                                    support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                                                                                                                    latency applications do not perform optimally on clouds today

                                                                                                                                                                                    bull Provide valuable fault tolerance and scalability abstractions

                                                                                                                                                                                    bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                                                                                                                    bull Clouds services to support research provide considerable leverage for both

                                                                                                                                                                                    individual researchers and entire communities of researchers

                                                                                                                                                                                    • Day 2 - Azure as a PaaS
                                                                                                                                                                                    • Day 2 - Applications

                                                                                                                                                                                      35 Nodes experience blob

                                                                                                                                                                                      writing failure at same

                                                                                                                                                                                      time

                                                                                                                                                                                      West Europe Datacenter 30976 tasks are completed and job was killed

                                                                                                                                                                                      A reasonable guess the

                                                                                                                                                                                      Fault Domain is working

                                                                                                                                                                                      MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                                                                                                                                      You never miss the water till the well has run dry Irish Proverb

                                                                                                                                                                                      ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                                                                                                                                      Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                                                                                                                                      λv = Latent heat of vaporization (Jg)

                                                                                                                                                                                      Rn = Net radiation (W m-2)

                                                                                                                                                                                      cp = Specific heat capacity of air (J kg-1 K-1)

                                                                                                                                                                                      ρa = dry air density (kg m-3)

                                                                                                                                                                                      δq = vapor pressure deficit (Pa)

                                                                                                                                                                                      ga = Conductivity of air (inverse of ra) (m s-1)

                                                                                                                                                                                      gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                                                                                                                                      γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                                                                                                                                      Estimating resistanceconductivity across a

                                                                                                                                                                                      catchment can be tricky

                                                                                                                                                                                      bull Lots of inputs big data reduction

                                                                                                                                                                                      bull Some of the inputs are not so simple

                                                                                                                                                                                      119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                                                                                                                                      (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                                                                                                                                      Penman-Monteith (1964)

                                                                                                                                                                                      Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                                                                                                                                      NASA MODIS

                                                                                                                                                                                      imagery source

                                                                                                                                                                                      archives

                                                                                                                                                                                      5 TB (600K files)

                                                                                                                                                                                      FLUXNET curated

                                                                                                                                                                                      sensor dataset

                                                                                                                                                                                      (30GB 960 files)

                                                                                                                                                                                      FLUXNET curated

                                                                                                                                                                                      field dataset

                                                                                                                                                                                      2 KB (1 file)

                                                                                                                                                                                      NCEPNCAR

                                                                                                                                                                                      ~100MB

                                                                                                                                                                                      (4K files)

                                                                                                                                                                                      Vegetative

                                                                                                                                                                                      clumping

                                                                                                                                                                                      ~5MB (1file)

                                                                                                                                                                                      Climate

                                                                                                                                                                                      classification

                                                                                                                                                                                      ~1MB (1file)

                                                                                                                                                                                      20 US year = 1 global year

                                                                                                                                                                                      Data collection (map) stage

                                                                                                                                                                                      bull Downloads requested input tiles

                                                                                                                                                                                      from NASA ftp sites

                                                                                                                                                                                      bull Includes geospatial lookup for

                                                                                                                                                                                      non-sinusoidal tiles that will

                                                                                                                                                                                      contribute to a reprojected

                                                                                                                                                                                      sinusoidal tile

                                                                                                                                                                                      Reprojection (map) stage

                                                                                                                                                                                      bull Converts source tile(s) to

                                                                                                                                                                                      intermediate result sinusoidal tiles

                                                                                                                                                                                      bull Simple nearest neighbor or spline

                                                                                                                                                                                      algorithms

                                                                                                                                                                                      Derivation reduction stage

                                                                                                                                                                                      bull First stage visible to scientist

                                                                                                                                                                                      bull Computes ET in our initial use

                                                                                                                                                                                      Analysis reduction stage

                                                                                                                                                                                      bull Optional second stage visible to

                                                                                                                                                                                      scientist

                                                                                                                                                                                      bull Enables production of science

                                                                                                                                                                                      analysis artifacts such as maps

                                                                                                                                                                                      tables virtual sensors

                                                                                                                                                                                      Reduction 1

                                                                                                                                                                                      Queue

                                                                                                                                                                                      Source

                                                                                                                                                                                      Metadata

                                                                                                                                                                                      AzureMODIS

                                                                                                                                                                                      Service Web Role Portal

                                                                                                                                                                                      Request

                                                                                                                                                                                      Queue

                                                                                                                                                                                      Scientific

                                                                                                                                                                                      Results

                                                                                                                                                                                      Download

                                                                                                                                                                                      Data Collection Stage

                                                                                                                                                                                      Source Imagery Download Sites

                                                                                                                                                                                      Reprojection

                                                                                                                                                                                      Queue

                                                                                                                                                                                      Reduction 2

                                                                                                                                                                                      Queue

                                                                                                                                                                                      Download

                                                                                                                                                                                      Queue

                                                                                                                                                                                      Scientists

                                                                                                                                                                                      Science results

                                                                                                                                                                                      Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                                                                      httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                                                                                                                      bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                                                                                                                      bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                                                                                                                      bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                                                                                                                      recoverable units of work

                                                                                                                                                                                      bull Execution status of all jobs and tasks persisted in Tables

                                                                                                                                                                                      ltPipelineStagegt

                                                                                                                                                                                      Request

                                                                                                                                                                                      hellip ltPipelineStagegtJobStatus

                                                                                                                                                                                      Persist ltPipelineStagegtJob Queue

                                                                                                                                                                                      MODISAzure Service

                                                                                                                                                                                      (Web Role)

                                                                                                                                                                                      Service Monitor

                                                                                                                                                                                      (Worker Role)

                                                                                                                                                                                      Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                                                                      hellip

                                                                                                                                                                                      Dispatch

                                                                                                                                                                                      ltPipelineStagegtTask Queue

                                                                                                                                                                                      All work actually done by a Worker Role

                                                                                                                                                                                      bull Sandboxes science or other executable

                                                                                                                                                                                      bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                                                                                                                      Service Monitor

                                                                                                                                                                                      (Worker Role)

                                                                                                                                                                                      Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                                                                      GenericWorker

                                                                                                                                                                                      (Worker Role)

                                                                                                                                                                                      hellip

                                                                                                                                                                                      hellip

                                                                                                                                                                                      Dispatch

                                                                                                                                                                                      ltPipelineStagegtTask Queue

                                                                                                                                                                                      hellip

                                                                                                                                                                                      ltInputgtData Storage

                                                                                                                                                                                      bull Dequeues tasks created by the Service Monitor

                                                                                                                                                                                      bull Retries failed tasks 3 times

                                                                                                                                                                                      bull Maintains all task status

                                                                                                                                                                                      Reprojection Request

                                                                                                                                                                                      hellip

                                                                                                                                                                                      Service Monitor

                                                                                                                                                                                      (Worker Role)

                                                                                                                                                                                      ReprojectionJobStatus Persist

                                                                                                                                                                                      Parse amp Persist ReprojectionTaskStatus

                                                                                                                                                                                      GenericWorker

                                                                                                                                                                                      (Worker Role)

                                                                                                                                                                                      hellip

                                                                                                                                                                                      Job Queue

                                                                                                                                                                                      hellip

                                                                                                                                                                                      Dispatch

                                                                                                                                                                                      Task Queue

                                                                                                                                                                                      Points to

                                                                                                                                                                                      hellip

                                                                                                                                                                                      ScanTimeList

                                                                                                                                                                                      SwathGranuleMeta

                                                                                                                                                                                      Reprojection Data

                                                                                                                                                                                      Storage

                                                                                                                                                                                      Each entity specifies a

                                                                                                                                                                                      single reprojection job

                                                                                                                                                                                      request

                                                                                                                                                                                      Each entity specifies a

                                                                                                                                                                                      single reprojection task (ie

                                                                                                                                                                                      a single tile)

                                                                                                                                                                                      Query this table to get

                                                                                                                                                                                      geo-metadata (eg

                                                                                                                                                                                      boundaries) for each swath

                                                                                                                                                                                      tile

                                                                                                                                                                                      Query this table to get the

                                                                                                                                                                                      list of satellite scan times

                                                                                                                                                                                      that cover a target tile

                                                                                                                                                                                      Swath Source

                                                                                                                                                                                      Data Storage

                                                                                                                                                                                      bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                                                                                                                      bull Storage costs driven by data scale and 6 month project duration

                                                                                                                                                                                      bull Small with respect to the people costs even at graduate student rates

                                                                                                                                                                                      Reduction 1 Queue

                                                                                                                                                                                      Source Metadata

                                                                                                                                                                                      Request Queue

                                                                                                                                                                                      Scientific Results Download

                                                                                                                                                                                      Data Collection Stage

                                                                                                                                                                                      Source Imagery Download Sites

                                                                                                                                                                                      Reprojection Queue

                                                                                                                                                                                      Reduction 2 Queue

                                                                                                                                                                                      Download

                                                                                                                                                                                      Queue

                                                                                                                                                                                      Scientists

                                                                                                                                                                                      Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                                                                      400-500 GB

                                                                                                                                                                                      60K files

                                                                                                                                                                                      10 MBsec

                                                                                                                                                                                      11 hours

                                                                                                                                                                                      lt10 workers

                                                                                                                                                                                      $50 upload

                                                                                                                                                                                      $450 storage

                                                                                                                                                                                      400 GB

                                                                                                                                                                                      45K files

                                                                                                                                                                                      3500 hours

                                                                                                                                                                                      20-100

                                                                                                                                                                                      workers

                                                                                                                                                                                      5-7 GB

                                                                                                                                                                                      55K files

                                                                                                                                                                                      1800 hours

                                                                                                                                                                                      20-100

                                                                                                                                                                                      workers

                                                                                                                                                                                      lt10 GB

                                                                                                                                                                                      ~1K files

                                                                                                                                                                                      1800 hours

                                                                                                                                                                                      20-100

                                                                                                                                                                                      workers

                                                                                                                                                                                      $420 cpu

                                                                                                                                                                                      $60 download

                                                                                                                                                                                      $216 cpu

                                                                                                                                                                                      $1 download

                                                                                                                                                                                      $6 storage

                                                                                                                                                                                      $216 cpu

                                                                                                                                                                                      $2 download

                                                                                                                                                                                      $9 storage

                                                                                                                                                                                      AzureMODIS

                                                                                                                                                                                      Service Web Role Portal

                                                                                                                                                                                      Total $1420

                                                                                                                                                                                      bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                                                                                                                      potential to be important to both large and small scale science problems

                                                                                                                                                                                      bull Equally import they can increase participation in research providing needed

                                                                                                                                                                                      resources to userscommunities without ready access

                                                                                                                                                                                      bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                                                                                                                      support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                                                                                                                      latency applications do not perform optimally on clouds today

                                                                                                                                                                                      bull Provide valuable fault tolerance and scalability abstractions

                                                                                                                                                                                      bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                                                                                                                      bull Clouds services to support research provide considerable leverage for both

                                                                                                                                                                                      individual researchers and entire communities of researchers

                                                                                                                                                                                      • Day 2 - Azure as a PaaS
                                                                                                                                                                                      • Day 2 - Applications

                                                                                                                                                                                        MODISAzure Computing Evapotranspiration (ET) in The Cloud

                                                                                                                                                                                        You never miss the water till the well has run dry Irish Proverb

                                                                                                                                                                                        ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                                                                                                                                        Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                                                                                                                                        λv = Latent heat of vaporization (Jg)

                                                                                                                                                                                        Rn = Net radiation (W m-2)

                                                                                                                                                                                        cp = Specific heat capacity of air (J kg-1 K-1)

                                                                                                                                                                                        ρa = dry air density (kg m-3)

                                                                                                                                                                                        δq = vapor pressure deficit (Pa)

                                                                                                                                                                                        ga = Conductivity of air (inverse of ra) (m s-1)

                                                                                                                                                                                        gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                                                                                                                                        γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                                                                                                                                        Estimating resistanceconductivity across a

                                                                                                                                                                                        catchment can be tricky

                                                                                                                                                                                        bull Lots of inputs big data reduction

                                                                                                                                                                                        bull Some of the inputs are not so simple

                                                                                                                                                                                        119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                                                                                                                                        (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                                                                                                                                        Penman-Monteith (1964)

                                                                                                                                                                                        Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                                                                                                                                        NASA MODIS

                                                                                                                                                                                        imagery source

                                                                                                                                                                                        archives

                                                                                                                                                                                        5 TB (600K files)

                                                                                                                                                                                        FLUXNET curated

                                                                                                                                                                                        sensor dataset

                                                                                                                                                                                        (30GB 960 files)

                                                                                                                                                                                        FLUXNET curated

                                                                                                                                                                                        field dataset

                                                                                                                                                                                        2 KB (1 file)

                                                                                                                                                                                        NCEPNCAR

                                                                                                                                                                                        ~100MB

                                                                                                                                                                                        (4K files)

                                                                                                                                                                                        Vegetative

                                                                                                                                                                                        clumping

                                                                                                                                                                                        ~5MB (1file)

                                                                                                                                                                                        Climate

                                                                                                                                                                                        classification

                                                                                                                                                                                        ~1MB (1file)

                                                                                                                                                                                        20 US year = 1 global year

                                                                                                                                                                                        Data collection (map) stage

                                                                                                                                                                                        bull Downloads requested input tiles

                                                                                                                                                                                        from NASA ftp sites

                                                                                                                                                                                        bull Includes geospatial lookup for

                                                                                                                                                                                        non-sinusoidal tiles that will

                                                                                                                                                                                        contribute to a reprojected

                                                                                                                                                                                        sinusoidal tile

                                                                                                                                                                                        Reprojection (map) stage

                                                                                                                                                                                        bull Converts source tile(s) to

                                                                                                                                                                                        intermediate result sinusoidal tiles

                                                                                                                                                                                        bull Simple nearest neighbor or spline

                                                                                                                                                                                        algorithms

                                                                                                                                                                                        Derivation reduction stage

                                                                                                                                                                                        bull First stage visible to scientist

                                                                                                                                                                                        bull Computes ET in our initial use

                                                                                                                                                                                        Analysis reduction stage

                                                                                                                                                                                        bull Optional second stage visible to

                                                                                                                                                                                        scientist

                                                                                                                                                                                        bull Enables production of science

                                                                                                                                                                                        analysis artifacts such as maps

                                                                                                                                                                                        tables virtual sensors

                                                                                                                                                                                        Reduction 1

                                                                                                                                                                                        Queue

                                                                                                                                                                                        Source

                                                                                                                                                                                        Metadata

                                                                                                                                                                                        AzureMODIS

                                                                                                                                                                                        Service Web Role Portal

                                                                                                                                                                                        Request

                                                                                                                                                                                        Queue

                                                                                                                                                                                        Scientific

                                                                                                                                                                                        Results

                                                                                                                                                                                        Download

                                                                                                                                                                                        Data Collection Stage

                                                                                                                                                                                        Source Imagery Download Sites

                                                                                                                                                                                        Reprojection

                                                                                                                                                                                        Queue

                                                                                                                                                                                        Reduction 2

                                                                                                                                                                                        Queue

                                                                                                                                                                                        Download

                                                                                                                                                                                        Queue

                                                                                                                                                                                        Scientists

                                                                                                                                                                                        Science results

                                                                                                                                                                                        Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                                                                        httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                                                                                                                        bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                                                                                                                        bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                                                                                                                        bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                                                                                                                        recoverable units of work

                                                                                                                                                                                        bull Execution status of all jobs and tasks persisted in Tables

                                                                                                                                                                                        ltPipelineStagegt

                                                                                                                                                                                        Request

                                                                                                                                                                                        hellip ltPipelineStagegtJobStatus

                                                                                                                                                                                        Persist ltPipelineStagegtJob Queue

                                                                                                                                                                                        MODISAzure Service

                                                                                                                                                                                        (Web Role)

                                                                                                                                                                                        Service Monitor

                                                                                                                                                                                        (Worker Role)

                                                                                                                                                                                        Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                                                                        hellip

                                                                                                                                                                                        Dispatch

                                                                                                                                                                                        ltPipelineStagegtTask Queue

                                                                                                                                                                                        All work actually done by a Worker Role

                                                                                                                                                                                        bull Sandboxes science or other executable

                                                                                                                                                                                        bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                                                                                                                        Service Monitor

                                                                                                                                                                                        (Worker Role)

                                                                                                                                                                                        Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                                                                        GenericWorker

                                                                                                                                                                                        (Worker Role)

                                                                                                                                                                                        hellip

                                                                                                                                                                                        hellip

                                                                                                                                                                                        Dispatch

                                                                                                                                                                                        ltPipelineStagegtTask Queue

                                                                                                                                                                                        hellip

                                                                                                                                                                                        ltInputgtData Storage

                                                                                                                                                                                        bull Dequeues tasks created by the Service Monitor

                                                                                                                                                                                        bull Retries failed tasks 3 times

                                                                                                                                                                                        bull Maintains all task status

                                                                                                                                                                                        Reprojection Request

                                                                                                                                                                                        hellip

                                                                                                                                                                                        Service Monitor

                                                                                                                                                                                        (Worker Role)

                                                                                                                                                                                        ReprojectionJobStatus Persist

                                                                                                                                                                                        Parse amp Persist ReprojectionTaskStatus

                                                                                                                                                                                        GenericWorker

                                                                                                                                                                                        (Worker Role)

                                                                                                                                                                                        hellip

                                                                                                                                                                                        Job Queue

                                                                                                                                                                                        hellip

                                                                                                                                                                                        Dispatch

                                                                                                                                                                                        Task Queue

                                                                                                                                                                                        Points to

                                                                                                                                                                                        hellip

                                                                                                                                                                                        ScanTimeList

                                                                                                                                                                                        SwathGranuleMeta

                                                                                                                                                                                        Reprojection Data

                                                                                                                                                                                        Storage

                                                                                                                                                                                        Each entity specifies a

                                                                                                                                                                                        single reprojection job

                                                                                                                                                                                        request

                                                                                                                                                                                        Each entity specifies a

                                                                                                                                                                                        single reprojection task (ie

                                                                                                                                                                                        a single tile)

                                                                                                                                                                                        Query this table to get

                                                                                                                                                                                        geo-metadata (eg

                                                                                                                                                                                        boundaries) for each swath

                                                                                                                                                                                        tile

                                                                                                                                                                                        Query this table to get the

                                                                                                                                                                                        list of satellite scan times

                                                                                                                                                                                        that cover a target tile

                                                                                                                                                                                        Swath Source

                                                                                                                                                                                        Data Storage

                                                                                                                                                                                        bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                                                                                                                        bull Storage costs driven by data scale and 6 month project duration

                                                                                                                                                                                        bull Small with respect to the people costs even at graduate student rates

                                                                                                                                                                                        Reduction 1 Queue

                                                                                                                                                                                        Source Metadata

                                                                                                                                                                                        Request Queue

                                                                                                                                                                                        Scientific Results Download

                                                                                                                                                                                        Data Collection Stage

                                                                                                                                                                                        Source Imagery Download Sites

                                                                                                                                                                                        Reprojection Queue

                                                                                                                                                                                        Reduction 2 Queue

                                                                                                                                                                                        Download

                                                                                                                                                                                        Queue

                                                                                                                                                                                        Scientists

                                                                                                                                                                                        Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                                                                        400-500 GB

                                                                                                                                                                                        60K files

                                                                                                                                                                                        10 MBsec

                                                                                                                                                                                        11 hours

                                                                                                                                                                                        lt10 workers

                                                                                                                                                                                        $50 upload

                                                                                                                                                                                        $450 storage

                                                                                                                                                                                        400 GB

                                                                                                                                                                                        45K files

                                                                                                                                                                                        3500 hours

                                                                                                                                                                                        20-100

                                                                                                                                                                                        workers

                                                                                                                                                                                        5-7 GB

                                                                                                                                                                                        55K files

                                                                                                                                                                                        1800 hours

                                                                                                                                                                                        20-100

                                                                                                                                                                                        workers

                                                                                                                                                                                        lt10 GB

                                                                                                                                                                                        ~1K files

                                                                                                                                                                                        1800 hours

                                                                                                                                                                                        20-100

                                                                                                                                                                                        workers

                                                                                                                                                                                        $420 cpu

                                                                                                                                                                                        $60 download

                                                                                                                                                                                        $216 cpu

                                                                                                                                                                                        $1 download

                                                                                                                                                                                        $6 storage

                                                                                                                                                                                        $216 cpu

                                                                                                                                                                                        $2 download

                                                                                                                                                                                        $9 storage

                                                                                                                                                                                        AzureMODIS

                                                                                                                                                                                        Service Web Role Portal

                                                                                                                                                                                        Total $1420

                                                                                                                                                                                        bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                                                                                                                        potential to be important to both large and small scale science problems

                                                                                                                                                                                        bull Equally import they can increase participation in research providing needed

                                                                                                                                                                                        resources to userscommunities without ready access

                                                                                                                                                                                        bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                                                                                                                        support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                                                                                                                        latency applications do not perform optimally on clouds today

                                                                                                                                                                                        bull Provide valuable fault tolerance and scalability abstractions

                                                                                                                                                                                        bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                                                                                                                        bull Clouds services to support research provide considerable leverage for both

                                                                                                                                                                                        individual researchers and entire communities of researchers

                                                                                                                                                                                        • Day 2 - Azure as a PaaS
                                                                                                                                                                                        • Day 2 - Applications

                                                                                                                                                                                          ET = Water volume evapotranspired (m3 s-1 m-2)

                                                                                                                                                                                          Δ = Rate of change of saturation specific humidity with air temperature(Pa K-1)

                                                                                                                                                                                          λv = Latent heat of vaporization (Jg)

                                                                                                                                                                                          Rn = Net radiation (W m-2)

                                                                                                                                                                                          cp = Specific heat capacity of air (J kg-1 K-1)

                                                                                                                                                                                          ρa = dry air density (kg m-3)

                                                                                                                                                                                          δq = vapor pressure deficit (Pa)

                                                                                                                                                                                          ga = Conductivity of air (inverse of ra) (m s-1)

                                                                                                                                                                                          gs = Conductivity of plant stoma air (inverse of rs) (m s-1)

                                                                                                                                                                                          γ = Psychrometric constant (γ asymp 66 Pa K-1)

                                                                                                                                                                                          Estimating resistanceconductivity across a

                                                                                                                                                                                          catchment can be tricky

                                                                                                                                                                                          bull Lots of inputs big data reduction

                                                                                                                                                                                          bull Some of the inputs are not so simple

                                                                                                                                                                                          119864119879 = ∆119877119899 + 120588119886 119888119901 120575119902 119892119886

                                                                                                                                                                                          (∆ + 120574 1 + 119892119886 119892119904 )120582120592

                                                                                                                                                                                          Penman-Monteith (1964)

                                                                                                                                                                                          Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration or evaporation through plant membranes by plants

                                                                                                                                                                                          NASA MODIS

                                                                                                                                                                                          imagery source

                                                                                                                                                                                          archives

                                                                                                                                                                                          5 TB (600K files)

                                                                                                                                                                                          FLUXNET curated

                                                                                                                                                                                          sensor dataset

                                                                                                                                                                                          (30GB 960 files)

                                                                                                                                                                                          FLUXNET curated

                                                                                                                                                                                          field dataset

                                                                                                                                                                                          2 KB (1 file)

                                                                                                                                                                                          NCEPNCAR

                                                                                                                                                                                          ~100MB

                                                                                                                                                                                          (4K files)

                                                                                                                                                                                          Vegetative

                                                                                                                                                                                          clumping

                                                                                                                                                                                          ~5MB (1file)

                                                                                                                                                                                          Climate

                                                                                                                                                                                          classification

                                                                                                                                                                                          ~1MB (1file)

                                                                                                                                                                                          20 US year = 1 global year

                                                                                                                                                                                          Data collection (map) stage

                                                                                                                                                                                          bull Downloads requested input tiles

                                                                                                                                                                                          from NASA ftp sites

                                                                                                                                                                                          bull Includes geospatial lookup for

                                                                                                                                                                                          non-sinusoidal tiles that will

                                                                                                                                                                                          contribute to a reprojected

                                                                                                                                                                                          sinusoidal tile

                                                                                                                                                                                          Reprojection (map) stage

                                                                                                                                                                                          bull Converts source tile(s) to

                                                                                                                                                                                          intermediate result sinusoidal tiles

                                                                                                                                                                                          bull Simple nearest neighbor or spline

                                                                                                                                                                                          algorithms

                                                                                                                                                                                          Derivation reduction stage

                                                                                                                                                                                          bull First stage visible to scientist

                                                                                                                                                                                          bull Computes ET in our initial use

                                                                                                                                                                                          Analysis reduction stage

                                                                                                                                                                                          bull Optional second stage visible to

                                                                                                                                                                                          scientist

                                                                                                                                                                                          bull Enables production of science

                                                                                                                                                                                          analysis artifacts such as maps

                                                                                                                                                                                          tables virtual sensors

                                                                                                                                                                                          Reduction 1

                                                                                                                                                                                          Queue

                                                                                                                                                                                          Source

                                                                                                                                                                                          Metadata

                                                                                                                                                                                          AzureMODIS

                                                                                                                                                                                          Service Web Role Portal

                                                                                                                                                                                          Request

                                                                                                                                                                                          Queue

                                                                                                                                                                                          Scientific

                                                                                                                                                                                          Results

                                                                                                                                                                                          Download

                                                                                                                                                                                          Data Collection Stage

                                                                                                                                                                                          Source Imagery Download Sites

                                                                                                                                                                                          Reprojection

                                                                                                                                                                                          Queue

                                                                                                                                                                                          Reduction 2

                                                                                                                                                                                          Queue

                                                                                                                                                                                          Download

                                                                                                                                                                                          Queue

                                                                                                                                                                                          Scientists

                                                                                                                                                                                          Science results

                                                                                                                                                                                          Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                                                                          httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                                                                                                                          bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                                                                                                                          bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                                                                                                                          bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                                                                                                                          recoverable units of work

                                                                                                                                                                                          bull Execution status of all jobs and tasks persisted in Tables

                                                                                                                                                                                          ltPipelineStagegt

                                                                                                                                                                                          Request

                                                                                                                                                                                          hellip ltPipelineStagegtJobStatus

                                                                                                                                                                                          Persist ltPipelineStagegtJob Queue

                                                                                                                                                                                          MODISAzure Service

                                                                                                                                                                                          (Web Role)

                                                                                                                                                                                          Service Monitor

                                                                                                                                                                                          (Worker Role)

                                                                                                                                                                                          Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                                                                          hellip

                                                                                                                                                                                          Dispatch

                                                                                                                                                                                          ltPipelineStagegtTask Queue

                                                                                                                                                                                          All work actually done by a Worker Role

                                                                                                                                                                                          bull Sandboxes science or other executable

                                                                                                                                                                                          bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                                                                                                                          Service Monitor

                                                                                                                                                                                          (Worker Role)

                                                                                                                                                                                          Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                                                                          GenericWorker

                                                                                                                                                                                          (Worker Role)

                                                                                                                                                                                          hellip

                                                                                                                                                                                          hellip

                                                                                                                                                                                          Dispatch

                                                                                                                                                                                          ltPipelineStagegtTask Queue

                                                                                                                                                                                          hellip

                                                                                                                                                                                          ltInputgtData Storage

                                                                                                                                                                                          bull Dequeues tasks created by the Service Monitor

                                                                                                                                                                                          bull Retries failed tasks 3 times

                                                                                                                                                                                          bull Maintains all task status

                                                                                                                                                                                          Reprojection Request

                                                                                                                                                                                          hellip

                                                                                                                                                                                          Service Monitor

                                                                                                                                                                                          (Worker Role)

                                                                                                                                                                                          ReprojectionJobStatus Persist

                                                                                                                                                                                          Parse amp Persist ReprojectionTaskStatus

                                                                                                                                                                                          GenericWorker

                                                                                                                                                                                          (Worker Role)

                                                                                                                                                                                          hellip

                                                                                                                                                                                          Job Queue

                                                                                                                                                                                          hellip

                                                                                                                                                                                          Dispatch

                                                                                                                                                                                          Task Queue

                                                                                                                                                                                          Points to

                                                                                                                                                                                          hellip

                                                                                                                                                                                          ScanTimeList

                                                                                                                                                                                          SwathGranuleMeta

                                                                                                                                                                                          Reprojection Data

                                                                                                                                                                                          Storage

                                                                                                                                                                                          Each entity specifies a

                                                                                                                                                                                          single reprojection job

                                                                                                                                                                                          request

                                                                                                                                                                                          Each entity specifies a

                                                                                                                                                                                          single reprojection task (ie

                                                                                                                                                                                          a single tile)

                                                                                                                                                                                          Query this table to get

                                                                                                                                                                                          geo-metadata (eg

                                                                                                                                                                                          boundaries) for each swath

                                                                                                                                                                                          tile

                                                                                                                                                                                          Query this table to get the

                                                                                                                                                                                          list of satellite scan times

                                                                                                                                                                                          that cover a target tile

                                                                                                                                                                                          Swath Source

                                                                                                                                                                                          Data Storage

                                                                                                                                                                                          bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                                                                                                                          bull Storage costs driven by data scale and 6 month project duration

                                                                                                                                                                                          bull Small with respect to the people costs even at graduate student rates

                                                                                                                                                                                          Reduction 1 Queue

                                                                                                                                                                                          Source Metadata

                                                                                                                                                                                          Request Queue

                                                                                                                                                                                          Scientific Results Download

                                                                                                                                                                                          Data Collection Stage

                                                                                                                                                                                          Source Imagery Download Sites

                                                                                                                                                                                          Reprojection Queue

                                                                                                                                                                                          Reduction 2 Queue

                                                                                                                                                                                          Download

                                                                                                                                                                                          Queue

                                                                                                                                                                                          Scientists

                                                                                                                                                                                          Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                                                                          400-500 GB

                                                                                                                                                                                          60K files

                                                                                                                                                                                          10 MBsec

                                                                                                                                                                                          11 hours

                                                                                                                                                                                          lt10 workers

                                                                                                                                                                                          $50 upload

                                                                                                                                                                                          $450 storage

                                                                                                                                                                                          400 GB

                                                                                                                                                                                          45K files

                                                                                                                                                                                          3500 hours

                                                                                                                                                                                          20-100

                                                                                                                                                                                          workers

                                                                                                                                                                                          5-7 GB

                                                                                                                                                                                          55K files

                                                                                                                                                                                          1800 hours

                                                                                                                                                                                          20-100

                                                                                                                                                                                          workers

                                                                                                                                                                                          lt10 GB

                                                                                                                                                                                          ~1K files

                                                                                                                                                                                          1800 hours

                                                                                                                                                                                          20-100

                                                                                                                                                                                          workers

                                                                                                                                                                                          $420 cpu

                                                                                                                                                                                          $60 download

                                                                                                                                                                                          $216 cpu

                                                                                                                                                                                          $1 download

                                                                                                                                                                                          $6 storage

                                                                                                                                                                                          $216 cpu

                                                                                                                                                                                          $2 download

                                                                                                                                                                                          $9 storage

                                                                                                                                                                                          AzureMODIS

                                                                                                                                                                                          Service Web Role Portal

                                                                                                                                                                                          Total $1420

                                                                                                                                                                                          bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                                                                                                                          potential to be important to both large and small scale science problems

                                                                                                                                                                                          bull Equally import they can increase participation in research providing needed

                                                                                                                                                                                          resources to userscommunities without ready access

                                                                                                                                                                                          bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                                                                                                                          support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                                                                                                                          latency applications do not perform optimally on clouds today

                                                                                                                                                                                          bull Provide valuable fault tolerance and scalability abstractions

                                                                                                                                                                                          bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                                                                                                                          bull Clouds services to support research provide considerable leverage for both

                                                                                                                                                                                          individual researchers and entire communities of researchers

                                                                                                                                                                                          • Day 2 - Azure as a PaaS
                                                                                                                                                                                          • Day 2 - Applications

                                                                                                                                                                                            NASA MODIS

                                                                                                                                                                                            imagery source

                                                                                                                                                                                            archives

                                                                                                                                                                                            5 TB (600K files)

                                                                                                                                                                                            FLUXNET curated

                                                                                                                                                                                            sensor dataset

                                                                                                                                                                                            (30GB 960 files)

                                                                                                                                                                                            FLUXNET curated

                                                                                                                                                                                            field dataset

                                                                                                                                                                                            2 KB (1 file)

                                                                                                                                                                                            NCEPNCAR

                                                                                                                                                                                            ~100MB

                                                                                                                                                                                            (4K files)

                                                                                                                                                                                            Vegetative

                                                                                                                                                                                            clumping

                                                                                                                                                                                            ~5MB (1file)

                                                                                                                                                                                            Climate

                                                                                                                                                                                            classification

                                                                                                                                                                                            ~1MB (1file)

                                                                                                                                                                                            20 US year = 1 global year

                                                                                                                                                                                            Data collection (map) stage

                                                                                                                                                                                            bull Downloads requested input tiles

                                                                                                                                                                                            from NASA ftp sites

                                                                                                                                                                                            bull Includes geospatial lookup for

                                                                                                                                                                                            non-sinusoidal tiles that will

                                                                                                                                                                                            contribute to a reprojected

                                                                                                                                                                                            sinusoidal tile

                                                                                                                                                                                            Reprojection (map) stage

                                                                                                                                                                                            bull Converts source tile(s) to

                                                                                                                                                                                            intermediate result sinusoidal tiles

                                                                                                                                                                                            bull Simple nearest neighbor or spline

                                                                                                                                                                                            algorithms

                                                                                                                                                                                            Derivation reduction stage

                                                                                                                                                                                            bull First stage visible to scientist

                                                                                                                                                                                            bull Computes ET in our initial use

                                                                                                                                                                                            Analysis reduction stage

                                                                                                                                                                                            bull Optional second stage visible to

                                                                                                                                                                                            scientist

                                                                                                                                                                                            bull Enables production of science

                                                                                                                                                                                            analysis artifacts such as maps

                                                                                                                                                                                            tables virtual sensors

                                                                                                                                                                                            Reduction 1

                                                                                                                                                                                            Queue

                                                                                                                                                                                            Source

                                                                                                                                                                                            Metadata

                                                                                                                                                                                            AzureMODIS

                                                                                                                                                                                            Service Web Role Portal

                                                                                                                                                                                            Request

                                                                                                                                                                                            Queue

                                                                                                                                                                                            Scientific

                                                                                                                                                                                            Results

                                                                                                                                                                                            Download

                                                                                                                                                                                            Data Collection Stage

                                                                                                                                                                                            Source Imagery Download Sites

                                                                                                                                                                                            Reprojection

                                                                                                                                                                                            Queue

                                                                                                                                                                                            Reduction 2

                                                                                                                                                                                            Queue

                                                                                                                                                                                            Download

                                                                                                                                                                                            Queue

                                                                                                                                                                                            Scientists

                                                                                                                                                                                            Science results

                                                                                                                                                                                            Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                                                                            httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                                                                                                                            bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                                                                                                                            bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                                                                                                                            bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                                                                                                                            recoverable units of work

                                                                                                                                                                                            bull Execution status of all jobs and tasks persisted in Tables

                                                                                                                                                                                            ltPipelineStagegt

                                                                                                                                                                                            Request

                                                                                                                                                                                            hellip ltPipelineStagegtJobStatus

                                                                                                                                                                                            Persist ltPipelineStagegtJob Queue

                                                                                                                                                                                            MODISAzure Service

                                                                                                                                                                                            (Web Role)

                                                                                                                                                                                            Service Monitor

                                                                                                                                                                                            (Worker Role)

                                                                                                                                                                                            Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                                                                            hellip

                                                                                                                                                                                            Dispatch

                                                                                                                                                                                            ltPipelineStagegtTask Queue

                                                                                                                                                                                            All work actually done by a Worker Role

                                                                                                                                                                                            bull Sandboxes science or other executable

                                                                                                                                                                                            bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                                                                                                                            Service Monitor

                                                                                                                                                                                            (Worker Role)

                                                                                                                                                                                            Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                                                                            GenericWorker

                                                                                                                                                                                            (Worker Role)

                                                                                                                                                                                            hellip

                                                                                                                                                                                            hellip

                                                                                                                                                                                            Dispatch

                                                                                                                                                                                            ltPipelineStagegtTask Queue

                                                                                                                                                                                            hellip

                                                                                                                                                                                            ltInputgtData Storage

                                                                                                                                                                                            bull Dequeues tasks created by the Service Monitor

                                                                                                                                                                                            bull Retries failed tasks 3 times

                                                                                                                                                                                            bull Maintains all task status

                                                                                                                                                                                            Reprojection Request

                                                                                                                                                                                            hellip

                                                                                                                                                                                            Service Monitor

                                                                                                                                                                                            (Worker Role)

                                                                                                                                                                                            ReprojectionJobStatus Persist

                                                                                                                                                                                            Parse amp Persist ReprojectionTaskStatus

                                                                                                                                                                                            GenericWorker

                                                                                                                                                                                            (Worker Role)

                                                                                                                                                                                            hellip

                                                                                                                                                                                            Job Queue

                                                                                                                                                                                            hellip

                                                                                                                                                                                            Dispatch

                                                                                                                                                                                            Task Queue

                                                                                                                                                                                            Points to

                                                                                                                                                                                            hellip

                                                                                                                                                                                            ScanTimeList

                                                                                                                                                                                            SwathGranuleMeta

                                                                                                                                                                                            Reprojection Data

                                                                                                                                                                                            Storage

                                                                                                                                                                                            Each entity specifies a

                                                                                                                                                                                            single reprojection job

                                                                                                                                                                                            request

                                                                                                                                                                                            Each entity specifies a

                                                                                                                                                                                            single reprojection task (ie

                                                                                                                                                                                            a single tile)

                                                                                                                                                                                            Query this table to get

                                                                                                                                                                                            geo-metadata (eg

                                                                                                                                                                                            boundaries) for each swath

                                                                                                                                                                                            tile

                                                                                                                                                                                            Query this table to get the

                                                                                                                                                                                            list of satellite scan times

                                                                                                                                                                                            that cover a target tile

                                                                                                                                                                                            Swath Source

                                                                                                                                                                                            Data Storage

                                                                                                                                                                                            bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                                                                                                                            bull Storage costs driven by data scale and 6 month project duration

                                                                                                                                                                                            bull Small with respect to the people costs even at graduate student rates

                                                                                                                                                                                            Reduction 1 Queue

                                                                                                                                                                                            Source Metadata

                                                                                                                                                                                            Request Queue

                                                                                                                                                                                            Scientific Results Download

                                                                                                                                                                                            Data Collection Stage

                                                                                                                                                                                            Source Imagery Download Sites

                                                                                                                                                                                            Reprojection Queue

                                                                                                                                                                                            Reduction 2 Queue

                                                                                                                                                                                            Download

                                                                                                                                                                                            Queue

                                                                                                                                                                                            Scientists

                                                                                                                                                                                            Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                                                                            400-500 GB

                                                                                                                                                                                            60K files

                                                                                                                                                                                            10 MBsec

                                                                                                                                                                                            11 hours

                                                                                                                                                                                            lt10 workers

                                                                                                                                                                                            $50 upload

                                                                                                                                                                                            $450 storage

                                                                                                                                                                                            400 GB

                                                                                                                                                                                            45K files

                                                                                                                                                                                            3500 hours

                                                                                                                                                                                            20-100

                                                                                                                                                                                            workers

                                                                                                                                                                                            5-7 GB

                                                                                                                                                                                            55K files

                                                                                                                                                                                            1800 hours

                                                                                                                                                                                            20-100

                                                                                                                                                                                            workers

                                                                                                                                                                                            lt10 GB

                                                                                                                                                                                            ~1K files

                                                                                                                                                                                            1800 hours

                                                                                                                                                                                            20-100

                                                                                                                                                                                            workers

                                                                                                                                                                                            $420 cpu

                                                                                                                                                                                            $60 download

                                                                                                                                                                                            $216 cpu

                                                                                                                                                                                            $1 download

                                                                                                                                                                                            $6 storage

                                                                                                                                                                                            $216 cpu

                                                                                                                                                                                            $2 download

                                                                                                                                                                                            $9 storage

                                                                                                                                                                                            AzureMODIS

                                                                                                                                                                                            Service Web Role Portal

                                                                                                                                                                                            Total $1420

                                                                                                                                                                                            bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                                                                                                                            potential to be important to both large and small scale science problems

                                                                                                                                                                                            bull Equally import they can increase participation in research providing needed

                                                                                                                                                                                            resources to userscommunities without ready access

                                                                                                                                                                                            bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                                                                                                                            support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                                                                                                                            latency applications do not perform optimally on clouds today

                                                                                                                                                                                            bull Provide valuable fault tolerance and scalability abstractions

                                                                                                                                                                                            bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                                                                                                                            bull Clouds services to support research provide considerable leverage for both

                                                                                                                                                                                            individual researchers and entire communities of researchers

                                                                                                                                                                                            • Day 2 - Azure as a PaaS
                                                                                                                                                                                            • Day 2 - Applications

                                                                                                                                                                                              Data collection (map) stage

                                                                                                                                                                                              bull Downloads requested input tiles

                                                                                                                                                                                              from NASA ftp sites

                                                                                                                                                                                              bull Includes geospatial lookup for

                                                                                                                                                                                              non-sinusoidal tiles that will

                                                                                                                                                                                              contribute to a reprojected

                                                                                                                                                                                              sinusoidal tile

                                                                                                                                                                                              Reprojection (map) stage

                                                                                                                                                                                              bull Converts source tile(s) to

                                                                                                                                                                                              intermediate result sinusoidal tiles

                                                                                                                                                                                              bull Simple nearest neighbor or spline

                                                                                                                                                                                              algorithms

                                                                                                                                                                                              Derivation reduction stage

                                                                                                                                                                                              bull First stage visible to scientist

                                                                                                                                                                                              bull Computes ET in our initial use

                                                                                                                                                                                              Analysis reduction stage

                                                                                                                                                                                              bull Optional second stage visible to

                                                                                                                                                                                              scientist

                                                                                                                                                                                              bull Enables production of science

                                                                                                                                                                                              analysis artifacts such as maps

                                                                                                                                                                                              tables virtual sensors

                                                                                                                                                                                              Reduction 1

                                                                                                                                                                                              Queue

                                                                                                                                                                                              Source

                                                                                                                                                                                              Metadata

                                                                                                                                                                                              AzureMODIS

                                                                                                                                                                                              Service Web Role Portal

                                                                                                                                                                                              Request

                                                                                                                                                                                              Queue

                                                                                                                                                                                              Scientific

                                                                                                                                                                                              Results

                                                                                                                                                                                              Download

                                                                                                                                                                                              Data Collection Stage

                                                                                                                                                                                              Source Imagery Download Sites

                                                                                                                                                                                              Reprojection

                                                                                                                                                                                              Queue

                                                                                                                                                                                              Reduction 2

                                                                                                                                                                                              Queue

                                                                                                                                                                                              Download

                                                                                                                                                                                              Queue

                                                                                                                                                                                              Scientists

                                                                                                                                                                                              Science results

                                                                                                                                                                                              Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                                                                              httpresearchmicrosoftcomen-usprojectsazureazuremodisaspx

                                                                                                                                                                                              bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                                                                                                                              bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                                                                                                                              bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                                                                                                                              recoverable units of work

                                                                                                                                                                                              bull Execution status of all jobs and tasks persisted in Tables

                                                                                                                                                                                              ltPipelineStagegt

                                                                                                                                                                                              Request

                                                                                                                                                                                              hellip ltPipelineStagegtJobStatus

                                                                                                                                                                                              Persist ltPipelineStagegtJob Queue

                                                                                                                                                                                              MODISAzure Service

                                                                                                                                                                                              (Web Role)

                                                                                                                                                                                              Service Monitor

                                                                                                                                                                                              (Worker Role)

                                                                                                                                                                                              Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                                                                              hellip

                                                                                                                                                                                              Dispatch

                                                                                                                                                                                              ltPipelineStagegtTask Queue

                                                                                                                                                                                              All work actually done by a Worker Role

                                                                                                                                                                                              bull Sandboxes science or other executable

                                                                                                                                                                                              bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                                                                                                                              Service Monitor

                                                                                                                                                                                              (Worker Role)

                                                                                                                                                                                              Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                                                                              GenericWorker

                                                                                                                                                                                              (Worker Role)

                                                                                                                                                                                              hellip

                                                                                                                                                                                              hellip

                                                                                                                                                                                              Dispatch

                                                                                                                                                                                              ltPipelineStagegtTask Queue

                                                                                                                                                                                              hellip

                                                                                                                                                                                              ltInputgtData Storage

                                                                                                                                                                                              bull Dequeues tasks created by the Service Monitor

                                                                                                                                                                                              bull Retries failed tasks 3 times

                                                                                                                                                                                              bull Maintains all task status

                                                                                                                                                                                              Reprojection Request

                                                                                                                                                                                              hellip

                                                                                                                                                                                              Service Monitor

                                                                                                                                                                                              (Worker Role)

                                                                                                                                                                                              ReprojectionJobStatus Persist

                                                                                                                                                                                              Parse amp Persist ReprojectionTaskStatus

                                                                                                                                                                                              GenericWorker

                                                                                                                                                                                              (Worker Role)

                                                                                                                                                                                              hellip

                                                                                                                                                                                              Job Queue

                                                                                                                                                                                              hellip

                                                                                                                                                                                              Dispatch

                                                                                                                                                                                              Task Queue

                                                                                                                                                                                              Points to

                                                                                                                                                                                              hellip

                                                                                                                                                                                              ScanTimeList

                                                                                                                                                                                              SwathGranuleMeta

                                                                                                                                                                                              Reprojection Data

                                                                                                                                                                                              Storage

                                                                                                                                                                                              Each entity specifies a

                                                                                                                                                                                              single reprojection job

                                                                                                                                                                                              request

                                                                                                                                                                                              Each entity specifies a

                                                                                                                                                                                              single reprojection task (ie

                                                                                                                                                                                              a single tile)

                                                                                                                                                                                              Query this table to get

                                                                                                                                                                                              geo-metadata (eg

                                                                                                                                                                                              boundaries) for each swath

                                                                                                                                                                                              tile

                                                                                                                                                                                              Query this table to get the

                                                                                                                                                                                              list of satellite scan times

                                                                                                                                                                                              that cover a target tile

                                                                                                                                                                                              Swath Source

                                                                                                                                                                                              Data Storage

                                                                                                                                                                                              bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                                                                                                                              bull Storage costs driven by data scale and 6 month project duration

                                                                                                                                                                                              bull Small with respect to the people costs even at graduate student rates

                                                                                                                                                                                              Reduction 1 Queue

                                                                                                                                                                                              Source Metadata

                                                                                                                                                                                              Request Queue

                                                                                                                                                                                              Scientific Results Download

                                                                                                                                                                                              Data Collection Stage

                                                                                                                                                                                              Source Imagery Download Sites

                                                                                                                                                                                              Reprojection Queue

                                                                                                                                                                                              Reduction 2 Queue

                                                                                                                                                                                              Download

                                                                                                                                                                                              Queue

                                                                                                                                                                                              Scientists

                                                                                                                                                                                              Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                                                                              400-500 GB

                                                                                                                                                                                              60K files

                                                                                                                                                                                              10 MBsec

                                                                                                                                                                                              11 hours

                                                                                                                                                                                              lt10 workers

                                                                                                                                                                                              $50 upload

                                                                                                                                                                                              $450 storage

                                                                                                                                                                                              400 GB

                                                                                                                                                                                              45K files

                                                                                                                                                                                              3500 hours

                                                                                                                                                                                              20-100

                                                                                                                                                                                              workers

                                                                                                                                                                                              5-7 GB

                                                                                                                                                                                              55K files

                                                                                                                                                                                              1800 hours

                                                                                                                                                                                              20-100

                                                                                                                                                                                              workers

                                                                                                                                                                                              lt10 GB

                                                                                                                                                                                              ~1K files

                                                                                                                                                                                              1800 hours

                                                                                                                                                                                              20-100

                                                                                                                                                                                              workers

                                                                                                                                                                                              $420 cpu

                                                                                                                                                                                              $60 download

                                                                                                                                                                                              $216 cpu

                                                                                                                                                                                              $1 download

                                                                                                                                                                                              $6 storage

                                                                                                                                                                                              $216 cpu

                                                                                                                                                                                              $2 download

                                                                                                                                                                                              $9 storage

                                                                                                                                                                                              AzureMODIS

                                                                                                                                                                                              Service Web Role Portal

                                                                                                                                                                                              Total $1420

                                                                                                                                                                                              bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                                                                                                                              potential to be important to both large and small scale science problems

                                                                                                                                                                                              bull Equally import they can increase participation in research providing needed

                                                                                                                                                                                              resources to userscommunities without ready access

                                                                                                                                                                                              bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                                                                                                                              support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                                                                                                                              latency applications do not perform optimally on clouds today

                                                                                                                                                                                              bull Provide valuable fault tolerance and scalability abstractions

                                                                                                                                                                                              bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                                                                                                                              bull Clouds services to support research provide considerable leverage for both

                                                                                                                                                                                              individual researchers and entire communities of researchers

                                                                                                                                                                                              • Day 2 - Azure as a PaaS
                                                                                                                                                                                              • Day 2 - Applications

                                                                                                                                                                                                bull ModisAzure Service is the Web Role front door bull Receives all user requests

                                                                                                                                                                                                bull Queues request to appropriate Download Reprojection or Reduction Job Queue

                                                                                                                                                                                                bull Service Monitor is a dedicated Worker Role bull Parses all job requests into tasks ndash

                                                                                                                                                                                                recoverable units of work

                                                                                                                                                                                                bull Execution status of all jobs and tasks persisted in Tables

                                                                                                                                                                                                ltPipelineStagegt

                                                                                                                                                                                                Request

                                                                                                                                                                                                hellip ltPipelineStagegtJobStatus

                                                                                                                                                                                                Persist ltPipelineStagegtJob Queue

                                                                                                                                                                                                MODISAzure Service

                                                                                                                                                                                                (Web Role)

                                                                                                                                                                                                Service Monitor

                                                                                                                                                                                                (Worker Role)

                                                                                                                                                                                                Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                                                                                hellip

                                                                                                                                                                                                Dispatch

                                                                                                                                                                                                ltPipelineStagegtTask Queue

                                                                                                                                                                                                All work actually done by a Worker Role

                                                                                                                                                                                                bull Sandboxes science or other executable

                                                                                                                                                                                                bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                                                                                                                                Service Monitor

                                                                                                                                                                                                (Worker Role)

                                                                                                                                                                                                Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                                                                                GenericWorker

                                                                                                                                                                                                (Worker Role)

                                                                                                                                                                                                hellip

                                                                                                                                                                                                hellip

                                                                                                                                                                                                Dispatch

                                                                                                                                                                                                ltPipelineStagegtTask Queue

                                                                                                                                                                                                hellip

                                                                                                                                                                                                ltInputgtData Storage

                                                                                                                                                                                                bull Dequeues tasks created by the Service Monitor

                                                                                                                                                                                                bull Retries failed tasks 3 times

                                                                                                                                                                                                bull Maintains all task status

                                                                                                                                                                                                Reprojection Request

                                                                                                                                                                                                hellip

                                                                                                                                                                                                Service Monitor

                                                                                                                                                                                                (Worker Role)

                                                                                                                                                                                                ReprojectionJobStatus Persist

                                                                                                                                                                                                Parse amp Persist ReprojectionTaskStatus

                                                                                                                                                                                                GenericWorker

                                                                                                                                                                                                (Worker Role)

                                                                                                                                                                                                hellip

                                                                                                                                                                                                Job Queue

                                                                                                                                                                                                hellip

                                                                                                                                                                                                Dispatch

                                                                                                                                                                                                Task Queue

                                                                                                                                                                                                Points to

                                                                                                                                                                                                hellip

                                                                                                                                                                                                ScanTimeList

                                                                                                                                                                                                SwathGranuleMeta

                                                                                                                                                                                                Reprojection Data

                                                                                                                                                                                                Storage

                                                                                                                                                                                                Each entity specifies a

                                                                                                                                                                                                single reprojection job

                                                                                                                                                                                                request

                                                                                                                                                                                                Each entity specifies a

                                                                                                                                                                                                single reprojection task (ie

                                                                                                                                                                                                a single tile)

                                                                                                                                                                                                Query this table to get

                                                                                                                                                                                                geo-metadata (eg

                                                                                                                                                                                                boundaries) for each swath

                                                                                                                                                                                                tile

                                                                                                                                                                                                Query this table to get the

                                                                                                                                                                                                list of satellite scan times

                                                                                                                                                                                                that cover a target tile

                                                                                                                                                                                                Swath Source

                                                                                                                                                                                                Data Storage

                                                                                                                                                                                                bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                                                                                                                                bull Storage costs driven by data scale and 6 month project duration

                                                                                                                                                                                                bull Small with respect to the people costs even at graduate student rates

                                                                                                                                                                                                Reduction 1 Queue

                                                                                                                                                                                                Source Metadata

                                                                                                                                                                                                Request Queue

                                                                                                                                                                                                Scientific Results Download

                                                                                                                                                                                                Data Collection Stage

                                                                                                                                                                                                Source Imagery Download Sites

                                                                                                                                                                                                Reprojection Queue

                                                                                                                                                                                                Reduction 2 Queue

                                                                                                                                                                                                Download

                                                                                                                                                                                                Queue

                                                                                                                                                                                                Scientists

                                                                                                                                                                                                Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                                                                                400-500 GB

                                                                                                                                                                                                60K files

                                                                                                                                                                                                10 MBsec

                                                                                                                                                                                                11 hours

                                                                                                                                                                                                lt10 workers

                                                                                                                                                                                                $50 upload

                                                                                                                                                                                                $450 storage

                                                                                                                                                                                                400 GB

                                                                                                                                                                                                45K files

                                                                                                                                                                                                3500 hours

                                                                                                                                                                                                20-100

                                                                                                                                                                                                workers

                                                                                                                                                                                                5-7 GB

                                                                                                                                                                                                55K files

                                                                                                                                                                                                1800 hours

                                                                                                                                                                                                20-100

                                                                                                                                                                                                workers

                                                                                                                                                                                                lt10 GB

                                                                                                                                                                                                ~1K files

                                                                                                                                                                                                1800 hours

                                                                                                                                                                                                20-100

                                                                                                                                                                                                workers

                                                                                                                                                                                                $420 cpu

                                                                                                                                                                                                $60 download

                                                                                                                                                                                                $216 cpu

                                                                                                                                                                                                $1 download

                                                                                                                                                                                                $6 storage

                                                                                                                                                                                                $216 cpu

                                                                                                                                                                                                $2 download

                                                                                                                                                                                                $9 storage

                                                                                                                                                                                                AzureMODIS

                                                                                                                                                                                                Service Web Role Portal

                                                                                                                                                                                                Total $1420

                                                                                                                                                                                                bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                                                                                                                                potential to be important to both large and small scale science problems

                                                                                                                                                                                                bull Equally import they can increase participation in research providing needed

                                                                                                                                                                                                resources to userscommunities without ready access

                                                                                                                                                                                                bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                                                                                                                                support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                                                                                                                                latency applications do not perform optimally on clouds today

                                                                                                                                                                                                bull Provide valuable fault tolerance and scalability abstractions

                                                                                                                                                                                                bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                                                                                                                                bull Clouds services to support research provide considerable leverage for both

                                                                                                                                                                                                individual researchers and entire communities of researchers

                                                                                                                                                                                                • Day 2 - Azure as a PaaS
                                                                                                                                                                                                • Day 2 - Applications

                                                                                                                                                                                                  All work actually done by a Worker Role

                                                                                                                                                                                                  bull Sandboxes science or other executable

                                                                                                                                                                                                  bull Marshalls all storage fromto Azure blob storage tofrom local Azure Worker instance files

                                                                                                                                                                                                  Service Monitor

                                                                                                                                                                                                  (Worker Role)

                                                                                                                                                                                                  Parse amp Persist ltPipelineStagegtTaskStatus

                                                                                                                                                                                                  GenericWorker

                                                                                                                                                                                                  (Worker Role)

                                                                                                                                                                                                  hellip

                                                                                                                                                                                                  hellip

                                                                                                                                                                                                  Dispatch

                                                                                                                                                                                                  ltPipelineStagegtTask Queue

                                                                                                                                                                                                  hellip

                                                                                                                                                                                                  ltInputgtData Storage

                                                                                                                                                                                                  bull Dequeues tasks created by the Service Monitor

                                                                                                                                                                                                  bull Retries failed tasks 3 times

                                                                                                                                                                                                  bull Maintains all task status

                                                                                                                                                                                                  Reprojection Request

                                                                                                                                                                                                  hellip

                                                                                                                                                                                                  Service Monitor

                                                                                                                                                                                                  (Worker Role)

                                                                                                                                                                                                  ReprojectionJobStatus Persist

                                                                                                                                                                                                  Parse amp Persist ReprojectionTaskStatus

                                                                                                                                                                                                  GenericWorker

                                                                                                                                                                                                  (Worker Role)

                                                                                                                                                                                                  hellip

                                                                                                                                                                                                  Job Queue

                                                                                                                                                                                                  hellip

                                                                                                                                                                                                  Dispatch

                                                                                                                                                                                                  Task Queue

                                                                                                                                                                                                  Points to

                                                                                                                                                                                                  hellip

                                                                                                                                                                                                  ScanTimeList

                                                                                                                                                                                                  SwathGranuleMeta

                                                                                                                                                                                                  Reprojection Data

                                                                                                                                                                                                  Storage

                                                                                                                                                                                                  Each entity specifies a

                                                                                                                                                                                                  single reprojection job

                                                                                                                                                                                                  request

                                                                                                                                                                                                  Each entity specifies a

                                                                                                                                                                                                  single reprojection task (ie

                                                                                                                                                                                                  a single tile)

                                                                                                                                                                                                  Query this table to get

                                                                                                                                                                                                  geo-metadata (eg

                                                                                                                                                                                                  boundaries) for each swath

                                                                                                                                                                                                  tile

                                                                                                                                                                                                  Query this table to get the

                                                                                                                                                                                                  list of satellite scan times

                                                                                                                                                                                                  that cover a target tile

                                                                                                                                                                                                  Swath Source

                                                                                                                                                                                                  Data Storage

                                                                                                                                                                                                  bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                                                                                                                                  bull Storage costs driven by data scale and 6 month project duration

                                                                                                                                                                                                  bull Small with respect to the people costs even at graduate student rates

                                                                                                                                                                                                  Reduction 1 Queue

                                                                                                                                                                                                  Source Metadata

                                                                                                                                                                                                  Request Queue

                                                                                                                                                                                                  Scientific Results Download

                                                                                                                                                                                                  Data Collection Stage

                                                                                                                                                                                                  Source Imagery Download Sites

                                                                                                                                                                                                  Reprojection Queue

                                                                                                                                                                                                  Reduction 2 Queue

                                                                                                                                                                                                  Download

                                                                                                                                                                                                  Queue

                                                                                                                                                                                                  Scientists

                                                                                                                                                                                                  Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                                                                                  400-500 GB

                                                                                                                                                                                                  60K files

                                                                                                                                                                                                  10 MBsec

                                                                                                                                                                                                  11 hours

                                                                                                                                                                                                  lt10 workers

                                                                                                                                                                                                  $50 upload

                                                                                                                                                                                                  $450 storage

                                                                                                                                                                                                  400 GB

                                                                                                                                                                                                  45K files

                                                                                                                                                                                                  3500 hours

                                                                                                                                                                                                  20-100

                                                                                                                                                                                                  workers

                                                                                                                                                                                                  5-7 GB

                                                                                                                                                                                                  55K files

                                                                                                                                                                                                  1800 hours

                                                                                                                                                                                                  20-100

                                                                                                                                                                                                  workers

                                                                                                                                                                                                  lt10 GB

                                                                                                                                                                                                  ~1K files

                                                                                                                                                                                                  1800 hours

                                                                                                                                                                                                  20-100

                                                                                                                                                                                                  workers

                                                                                                                                                                                                  $420 cpu

                                                                                                                                                                                                  $60 download

                                                                                                                                                                                                  $216 cpu

                                                                                                                                                                                                  $1 download

                                                                                                                                                                                                  $6 storage

                                                                                                                                                                                                  $216 cpu

                                                                                                                                                                                                  $2 download

                                                                                                                                                                                                  $9 storage

                                                                                                                                                                                                  AzureMODIS

                                                                                                                                                                                                  Service Web Role Portal

                                                                                                                                                                                                  Total $1420

                                                                                                                                                                                                  bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                                                                                                                                  potential to be important to both large and small scale science problems

                                                                                                                                                                                                  bull Equally import they can increase participation in research providing needed

                                                                                                                                                                                                  resources to userscommunities without ready access

                                                                                                                                                                                                  bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                                                                                                                                  support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                                                                                                                                  latency applications do not perform optimally on clouds today

                                                                                                                                                                                                  bull Provide valuable fault tolerance and scalability abstractions

                                                                                                                                                                                                  bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                                                                                                                                  bull Clouds services to support research provide considerable leverage for both

                                                                                                                                                                                                  individual researchers and entire communities of researchers

                                                                                                                                                                                                  • Day 2 - Azure as a PaaS
                                                                                                                                                                                                  • Day 2 - Applications

                                                                                                                                                                                                    Reprojection Request

                                                                                                                                                                                                    hellip

                                                                                                                                                                                                    Service Monitor

                                                                                                                                                                                                    (Worker Role)

                                                                                                                                                                                                    ReprojectionJobStatus Persist

                                                                                                                                                                                                    Parse amp Persist ReprojectionTaskStatus

                                                                                                                                                                                                    GenericWorker

                                                                                                                                                                                                    (Worker Role)

                                                                                                                                                                                                    hellip

                                                                                                                                                                                                    Job Queue

                                                                                                                                                                                                    hellip

                                                                                                                                                                                                    Dispatch

                                                                                                                                                                                                    Task Queue

                                                                                                                                                                                                    Points to

                                                                                                                                                                                                    hellip

                                                                                                                                                                                                    ScanTimeList

                                                                                                                                                                                                    SwathGranuleMeta

                                                                                                                                                                                                    Reprojection Data

                                                                                                                                                                                                    Storage

                                                                                                                                                                                                    Each entity specifies a

                                                                                                                                                                                                    single reprojection job

                                                                                                                                                                                                    request

                                                                                                                                                                                                    Each entity specifies a

                                                                                                                                                                                                    single reprojection task (ie

                                                                                                                                                                                                    a single tile)

                                                                                                                                                                                                    Query this table to get

                                                                                                                                                                                                    geo-metadata (eg

                                                                                                                                                                                                    boundaries) for each swath

                                                                                                                                                                                                    tile

                                                                                                                                                                                                    Query this table to get the

                                                                                                                                                                                                    list of satellite scan times

                                                                                                                                                                                                    that cover a target tile

                                                                                                                                                                                                    Swath Source

                                                                                                                                                                                                    Data Storage

                                                                                                                                                                                                    bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                                                                                                                                    bull Storage costs driven by data scale and 6 month project duration

                                                                                                                                                                                                    bull Small with respect to the people costs even at graduate student rates

                                                                                                                                                                                                    Reduction 1 Queue

                                                                                                                                                                                                    Source Metadata

                                                                                                                                                                                                    Request Queue

                                                                                                                                                                                                    Scientific Results Download

                                                                                                                                                                                                    Data Collection Stage

                                                                                                                                                                                                    Source Imagery Download Sites

                                                                                                                                                                                                    Reprojection Queue

                                                                                                                                                                                                    Reduction 2 Queue

                                                                                                                                                                                                    Download

                                                                                                                                                                                                    Queue

                                                                                                                                                                                                    Scientists

                                                                                                                                                                                                    Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                                                                                    400-500 GB

                                                                                                                                                                                                    60K files

                                                                                                                                                                                                    10 MBsec

                                                                                                                                                                                                    11 hours

                                                                                                                                                                                                    lt10 workers

                                                                                                                                                                                                    $50 upload

                                                                                                                                                                                                    $450 storage

                                                                                                                                                                                                    400 GB

                                                                                                                                                                                                    45K files

                                                                                                                                                                                                    3500 hours

                                                                                                                                                                                                    20-100

                                                                                                                                                                                                    workers

                                                                                                                                                                                                    5-7 GB

                                                                                                                                                                                                    55K files

                                                                                                                                                                                                    1800 hours

                                                                                                                                                                                                    20-100

                                                                                                                                                                                                    workers

                                                                                                                                                                                                    lt10 GB

                                                                                                                                                                                                    ~1K files

                                                                                                                                                                                                    1800 hours

                                                                                                                                                                                                    20-100

                                                                                                                                                                                                    workers

                                                                                                                                                                                                    $420 cpu

                                                                                                                                                                                                    $60 download

                                                                                                                                                                                                    $216 cpu

                                                                                                                                                                                                    $1 download

                                                                                                                                                                                                    $6 storage

                                                                                                                                                                                                    $216 cpu

                                                                                                                                                                                                    $2 download

                                                                                                                                                                                                    $9 storage

                                                                                                                                                                                                    AzureMODIS

                                                                                                                                                                                                    Service Web Role Portal

                                                                                                                                                                                                    Total $1420

                                                                                                                                                                                                    bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                                                                                                                                    potential to be important to both large and small scale science problems

                                                                                                                                                                                                    bull Equally import they can increase participation in research providing needed

                                                                                                                                                                                                    resources to userscommunities without ready access

                                                                                                                                                                                                    bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                                                                                                                                    support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                                                                                                                                    latency applications do not perform optimally on clouds today

                                                                                                                                                                                                    bull Provide valuable fault tolerance and scalability abstractions

                                                                                                                                                                                                    bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                                                                                                                                    bull Clouds services to support research provide considerable leverage for both

                                                                                                                                                                                                    individual researchers and entire communities of researchers

                                                                                                                                                                                                    • Day 2 - Azure as a PaaS
                                                                                                                                                                                                    • Day 2 - Applications

                                                                                                                                                                                                      bull Computational costs driven by data scale and need to run reduction multiple times

                                                                                                                                                                                                      bull Storage costs driven by data scale and 6 month project duration

                                                                                                                                                                                                      bull Small with respect to the people costs even at graduate student rates

                                                                                                                                                                                                      Reduction 1 Queue

                                                                                                                                                                                                      Source Metadata

                                                                                                                                                                                                      Request Queue

                                                                                                                                                                                                      Scientific Results Download

                                                                                                                                                                                                      Data Collection Stage

                                                                                                                                                                                                      Source Imagery Download Sites

                                                                                                                                                                                                      Reprojection Queue

                                                                                                                                                                                                      Reduction 2 Queue

                                                                                                                                                                                                      Download

                                                                                                                                                                                                      Queue

                                                                                                                                                                                                      Scientists

                                                                                                                                                                                                      Analysis Reduction Stage Derivation Reduction Stage Reprojection Stage

                                                                                                                                                                                                      400-500 GB

                                                                                                                                                                                                      60K files

                                                                                                                                                                                                      10 MBsec

                                                                                                                                                                                                      11 hours

                                                                                                                                                                                                      lt10 workers

                                                                                                                                                                                                      $50 upload

                                                                                                                                                                                                      $450 storage

                                                                                                                                                                                                      400 GB

                                                                                                                                                                                                      45K files

                                                                                                                                                                                                      3500 hours

                                                                                                                                                                                                      20-100

                                                                                                                                                                                                      workers

                                                                                                                                                                                                      5-7 GB

                                                                                                                                                                                                      55K files

                                                                                                                                                                                                      1800 hours

                                                                                                                                                                                                      20-100

                                                                                                                                                                                                      workers

                                                                                                                                                                                                      lt10 GB

                                                                                                                                                                                                      ~1K files

                                                                                                                                                                                                      1800 hours

                                                                                                                                                                                                      20-100

                                                                                                                                                                                                      workers

                                                                                                                                                                                                      $420 cpu

                                                                                                                                                                                                      $60 download

                                                                                                                                                                                                      $216 cpu

                                                                                                                                                                                                      $1 download

                                                                                                                                                                                                      $6 storage

                                                                                                                                                                                                      $216 cpu

                                                                                                                                                                                                      $2 download

                                                                                                                                                                                                      $9 storage

                                                                                                                                                                                                      AzureMODIS

                                                                                                                                                                                                      Service Web Role Portal

                                                                                                                                                                                                      Total $1420

                                                                                                                                                                                                      bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                                                                                                                                      potential to be important to both large and small scale science problems

                                                                                                                                                                                                      bull Equally import they can increase participation in research providing needed

                                                                                                                                                                                                      resources to userscommunities without ready access

                                                                                                                                                                                                      bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                                                                                                                                      support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                                                                                                                                      latency applications do not perform optimally on clouds today

                                                                                                                                                                                                      bull Provide valuable fault tolerance and scalability abstractions

                                                                                                                                                                                                      bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                                                                                                                                      bull Clouds services to support research provide considerable leverage for both

                                                                                                                                                                                                      individual researchers and entire communities of researchers

                                                                                                                                                                                                      • Day 2 - Azure as a PaaS
                                                                                                                                                                                                      • Day 2 - Applications

                                                                                                                                                                                                        bull Clouds are the largest scale computer centers ever constructed and have the

                                                                                                                                                                                                        potential to be important to both large and small scale science problems

                                                                                                                                                                                                        bull Equally import they can increase participation in research providing needed

                                                                                                                                                                                                        resources to userscommunities without ready access

                                                                                                                                                                                                        bull Clouds suitable for ldquoloosely coupledrdquo data parallel applications and can

                                                                                                                                                                                                        support many interesting ldquoprogramming patternsrdquo but tightly coupled low-

                                                                                                                                                                                                        latency applications do not perform optimally on clouds today

                                                                                                                                                                                                        bull Provide valuable fault tolerance and scalability abstractions

                                                                                                                                                                                                        bull Clouds as amplifier for familiar client tools and on premise compute

                                                                                                                                                                                                        bull Clouds services to support research provide considerable leverage for both

                                                                                                                                                                                                        individual researchers and entire communities of researchers

                                                                                                                                                                                                        • Day 2 - Azure as a PaaS
                                                                                                                                                                                                        • Day 2 - Applications

                                                                                                                                                                                                          top related