Top Banner
Shelter from the Storm Building a Safe Archive in a Hostile World
15

Shelter from the Storm Building a Safe Archive in a Hostile World.

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Shelter from the Storm Building a Safe Archive in a Hostile World.

Shelter from the Storm

Building a Safe Archive in a Hostile World

Page 2: Shelter from the Storm Building a Safe Archive in a Hostile World.

SCOOP Goal

• SURA-funded Coastal Modeling Project

• Want to develop the community’s cutting-edge techniques to make them ready for use in tomorrow’s production systems.

• For example, automatic verification of storm/surge models against observed data, to help improve the models

Page 3: Shelter from the Storm Building a Safe Archive in a Hostile World.

CCT Goals

• One of CCTs key research outputs is software

• Want this to be software of a good quality, to be robust

• Want re-use of software across projects

• Also want software to be picked up by external users, as well as collaborators

Page 4: Shelter from the Storm Building a Safe Archive in a Hostile World.

The SCOOP Archive

• Need to archive lots of files– Atmospheric models (MM5, GFDL)– Hydrodynamic models (ADCIRC, SWAN, etc)– Observational data (sensor data, buoys)

• Requirements poorly defined:– How much data? Don’t know– How long should we keep it for? Don’t know

• Have to interface with bespoke data transport mechanisms (LDM)

• How to achieve our goals under these conditions?!

Page 5: Shelter from the Storm Building a Safe Archive in a Hostile World.

Basic Archive Operation

Upload:1. Client signals they want to do an upload

of some files (names are given)

2. Archive tells the client where to upload them to (transaction handles)

3. Client uploads files (indep. of archive)

4. Client tells archive it’s done

5. Archive creates the logical filenames

• Use “upload” tool for this

Page 6: Shelter from the Storm Building a Safe Archive in a Hostile World.

Basic Archive Operation

Download:1. Clients use the catalog service to

discover/search for logical filenames2. Clients talk to the RLS server to get

physical URLs3. Interact with physical URLs directly

• Can use “getdata” CLI tool to encapsulate this

• Also, there are portal pages...

Page 7: Shelter from the Storm Building a Safe Archive in a Hostile World.

Operations on Service

• fileUploadBegin - for starting an upload

• fileUploadEnd - for saying that an upload is completed

• logicalNameRetry

• removeDeadTransactions

• closeArchive

Page 8: Shelter from the Storm Building a Safe Archive in a Hostile World.

Distributed Software

• Some services hosted externally• Can’t assume our machine or s/w never fails• Need to retain state of our service on restart

Page 9: Shelter from the Storm Building a Safe Archive in a Hostile World.

Robust Code

• Don’t assume our service will remain “up”=> Keep all internal state in a database=> Reload internal state on a restart

• Don’t assume external services always “up”=> Design loosely coupled services=> Store pending interactions in the database=> Retry these periodically

• Do “stress testing” on the service during the testing/debug cycle

Page 10: Shelter from the Storm Building a Safe Archive in a Hostile World.

int logname_initialize(void);

void logname_remove(void);

bool logname_create_logfile

(std::string logical_name,

bool name_is_final,

const std::vector<std::string>& urls);

bool logname_delete_logfile(std::string logical_name);

ulong logname_upload_pending_lognames

(ulong max_rows,

ulong& total_found,

ulong& max_rows_used);

Keep the internalAPIs Simple

Page 11: Shelter from the Storm Building a Safe Archive in a Hostile World.

Encouraging Reuse

• SCOOP Archive has lots of strange rules about filenames and metadata

• During design and implementation, keep thinking:– Is this for the SCOOP project, or– Is this a generic feature

• Use good O-O design to keep SCOOP code separate from archive code

Page 12: Shelter from the Storm Building a Safe Archive in a Hostile World.

Keeping SCOOPto one side...class ArchiveFilingLogic {

public:

// Called by the default moveFiles implementation virtual bool createPhysicalPath(std::string physicalPath);

virtual bool moveFiles(std::vector<std::string>& fileNames,std::vector<std::string>& missingFiles,std::string stagePath,std::string physicalPath);

virtual void physicalLocationForFiles (const std::vector<std::string>& filenames, std::map<std::string,std::string>& directories, std::map<std::string,std::string>& errors)=0;

virtual std::vector<std::string> logicalNamesForFiles(const std::vector<std::string>& filenames,std::string physicalPath)=0;

} ;

Page 13: Shelter from the Storm Building a Safe Archive in a Hostile World.

New Requirements

• Handling common compression formats• Producing subsets of data (predictively)• Tracking data before it is ingested• Notifying people when data arrives• Transforming data to other formats• Generating analytical data “on the fly”• Federating data across multiple locations

• Good initial design will simplify all this...

Page 14: Shelter from the Storm Building a Safe Archive in a Hostile World.

Highest Priority...

• Archive machine running out of space• People have started to rely on the service

• So, currently we are uploading copies of all data to SDSC DataCenter, using SRB

• Now need to keep track of URLs on physically distributed resources

• But SRB can help with some of the other requirements...

Page 15: Shelter from the Storm Building a Safe Archive in a Hostile World.

Any Questions?