From SRB to IRODS: From SRB to IRODS: Policy Virtualization Policy Virtualization using Rule-Based Data using Rule-Based Data Grids Grids Reagan W. Moore Reagan W. Moore Wayne Schroeder Wayne Schroeder Arcot Rajasekar Arcot Rajasekar Mike Wan Mike Wan San Diego Supercomputer Center San Diego Supercomputer Center [email protected][email protected]http://irods.sdsc.edu
28
Embed
From SRB to IRODS: Policy Virtualization using Rule-Based Data Grids Reagan W. Moore Wayne Schroeder Arcot Rajasekar Mike Wan San Diego Supercomputer Center.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
From SRB to IRODS: From SRB to IRODS: Policy Virtualization using Policy Virtualization using
Rule-Based Data GridsRule-Based Data Grids
Reagan W. MooreReagan W. Moore
Wayne SchroederWayne Schroeder
Arcot RajasekarArcot Rajasekar
Mike WanMike Wan
San Diego Supercomputer CenterSan Diego Supercomputer Center
• Data grids organize distributed data into shared collections • Persistent name spaces for files, users, storage• Collection attributes
• Provenance, descriptive, system metadata
• Data grids manage heterogeneous storage systems• Standard operations across file systems, tape archives,
object ring buffers• Enable technology evolution
• At the point in time when new technology is available, both the old and new systems can be integrated
Data GridData Grid
Using a Data Grid – Using a Data Grid – in Abstractin Abstract
Ask for d
ata
•User asks for data from the data grid
Data d
elivere
d
•The data is found and returned•Where & how details are hidden
Using a Data Grid - Using a Data Grid - DetailsDetails
iRODS Server
•Data request goes to iRODS Server
iRODS Server Metadata Catalog
DB
•Server looks up information in catalog
•Catalog tells which iRODS server has data
•1st server asks 2nd for data
•The 2nd iRODS server applies rules
•User asks for data
Extremely SuccessfulExtremely Successful• Storage Resource Broker (SRB) manages 2 PBs of data in
internationally shared collections• Data collections for NSF, NARA, NASA, DOE, DOD, NIH, LC,
NHPRC, IMLS; APAC, UK e-Science, IN2P3, KEK, …• Astronomy Data grid• Bio-informatics Digital library• Earth Sciences Data grid• Ecology Collection• Education Persistent archive• Engineering Digital library• Environmental science Data grid• High energy physics Data grid• Humanities Data Grid• Medical community Digital library• Oceanography Real time sensor data, persistent archive • Seismology Digital library, real-time sensor data
• Goal has been generic infrastructure for distributed data
• Observe that as the size of the shared collections grow, the administrative tasks can become onerous.• Data grids provide mechanisms to manage recovery
from all errors that occur in the distributed environment
• Need to minimize labor support through automation of administrative functions• File ingestion tasks• Verification of desired collection properties• Integrity checks and replica management
• Observe that each community has unique management policies• User administration • File retention & deletion • Time-dependent access controls• Data distribution and replication• File update (versions, backups)• Descriptive metadata
• The users of the collection have their own criteria for the properties they expect
• Socialization is the mapping from creator assertions to user expectations
Data Grid MechanismsData Grid Mechanisms
• Essential components needed for synergism implemented in SRB • Infrastructure independence• Data and trust virtualization
• Components needed for specific management policies and processes implemented in iRODS• Map policies to rules that control all processes• Map processes to standard micro-services
Data ManagementData Management
Data ManagementEnvironment
ConservedProperties
ControlMechanisms
RemoteOperations
ManagementFunctions
AssessmentCriteria
ManagementPolicies
Capabilities
Data grid – Management virtualizationData Management
InfrastructurePersistent
StateRules Micro-services
Data grid – Data and trust virtualizationPhysical
InfrastructureDatabase Rule Engine Storage
System
iRODS - integrated Rule-Oriented Data SystemiRODS - integrated Rule-Oriented Data System
RulesRules
• Rule classes• System enforced rules• Administrator controlled rules• User defined rules
• Rule execution• Atomic rules - executed on each operation invoked by
a client• Deferred rules - executed at a future time• Periodic rules - executed to validate assessment
criteria and enforce desired properties (integrity)
iRODS Rule SyntaxiRODS Rule Syntax
• Event | Condition | Action-set | Recovery-set• Event - triggered by operation or queued rule
• Condition - composed of tests on any attributes in
the persistent state information
• Action-set - composed from both micro-services
and rules
• Recovery-set - used to ensure transaction semantics
and consistent state information
• Executed by a rule engine installed at each storage location - server side workflows
Micro-ServicesMicro-Services
• Challenge is that storage systems do not provide desired processes• Have “minimal” set of standard operations that
are performed at the storage system• Have actions required by clients such as
replication, metadata extraction• Create standard micro-services that aggregate
storage operations into modules that can be used to implement desired processes.
Data VirtualizationData Virtualization
Storage SystemStorage System
Storage ProtocolStorage Protocol
Access InterfaceAccess Interface
Standard Micro-servicesStandard Micro-services
Data GridData Grid
Map from the actions
requested by the access
method to a standard set of
micro-services. The
standard micro-services
are mapped to the
operations supported by the storage system
Standard OperationsStandard Operations
integrated Rule-Oriented Data Systemintegrated Rule-Oriented Data System
Client Interface Admin Interface
Current State
Rule Invoker
MicroService
Modules
Metadata-based Services
Resources
MicroService
Modules
Resource-based Services
ServiceManager
ConsistencyCheck
Module
RuleModifierModule
ConsistencyCheck
Module
Engine
Rule
Confs
ConfigModifierModule
MetadataModifierModule
MetadataPersistent
Repository
ConsistencyCheck
Module
RuleBase
Distributed Management SystemDistributed Management System
RuleRule
EngineEngine
DataData
TransportTransport
MetadataMetadata
CatalogCatalog
ExecutionExecution
ControlControl
MessagingMessaging
SystemSystem
ExecutionExecution
EngineEngine
VirtualizationVirtualization
ServerServer
SideSide
WorkflowWorkflow
PersistentPersistent
StateState
informationinformation
SchedulingScheduling
PolicyPolicy
ManagementManagement
Micro-service ClassesMicro-service Classes
• Test
• System
• Workflow control
• Client
• iCAT catalog
• User level invoked by “irule”
• Image manipulation
Digital PreservationDigital Preservation
• Preservation community is defining the rules need to assert trustworthiness of a digital repository• RLG/NARA - Trustworthy Repositories Audit &
• Structured information• Parse audit trails to generate compliance reports• Apply templates to extract information• Apply templates to format state information
iRODS DevelopmentiRODS Development
• NSF - SDCI grant “Adaptive Middleware for Community Shared Collections”• iRODS development, SRB maintenance
• NSF - Ocean Research Interactive Observatory Network (ORION)• Real-time sensor data stream management
• NSF - Temporal Dynamics of Learning Center data grid• Management of Institution Research Board approval
iRODS Development StatusiRODS Development Status
• Current release is version 0.9.2• June 2007
• Production release will be version 1.0• Fall quarter 2007
• International collaborations• SHAMAN - University of Liverpool
• Sustaining Heritage Access through Multivalent ArchiviNg
• UK e-Science data grid• IN2P3 in Lyon, France• DSpace policy management
Planned DevelopmentPlanned Development
• GSI support• Time-limited sessions via a one-way hash authentication• Python Client library• GUI Browser (AJAX in development)• Driver for HPSS (in development)• Driver for SAM-QFS• Porting to additional versions of Unix/Linux• Porting to Windows• Support for MySQL as the metadata catalog• API support packages based on existing mounted collection driver• MCAT to ICAT migration tools• Extensible Metadata including Databases Access Interface• Zones/Federation • Auditing - mechanisms to record and track iRODS persistent state
changes
For More InformationFor More Information(iRODS Tutorial on Thursday)(iRODS Tutorial on Thursday)