A Robust Framework for Real-Time Distributed Processing of Satellite Data Yongsheng Zhao, Shahram Tehranian, Viktor Zubko, Anand Swaroop AC Technologies, Inc. Advanced Computation and Information Services 4640 Forbes Blvd. Suite 320 Lanham, MD 20706 Keith Mckenzie National Environmental Satellite Data Information Service (NESDIS) National Oceanic and Atmospheric Administration (NOAA) U.S. Department of Commerce Suitland, MD 20746
31
Embed
A Robust Framework for Real- Time Distributed Processing of Satellite Data Yongsheng Zhao, Shahram Tehranian, Viktor Zubko, Anand Swaroop AC Technologies,
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
A Robust Framework for Real-Time Distributed Processing of Satellite Data Yongsheng Zhao, Shahram Tehranian, Viktor Zubko, Anand Swaroop
AC Technologies, Inc.Advanced Computation and Information Services4640 Forbes Blvd. Suite 320Lanham, MD 20706
Keith Mckenzie National Environmental Satellite Data Information Service (NESDIS)National Oceanic and Atmospheric Administration (NOAA)U.S. Department of CommerceSuitland, MD 20746
Agenda
Introduction Linux Clusters GIFTS Data Processing Framework Architectural Design Implementations GIFTS Science Algorithm Pipeline Results Conclusion and Future Work
Introduction
Geosynchronous Imaging Fourier Transform Spectrometer (GIFTS) Large area Focal Plane Array (LFPA)
Each frame covers 512km X 512km at ground level Fourier Transform Spectrometer (FTS)
Provides high spectral resolution of 0.6 cm-1, yielding vertical resolutions of approx. 1 km at nadir to 3 km near the edge of LFPA
Retrieves 16,384 sounding observations per frame scanned in 11 seconds
Two detector arrays to cover LW (68 – 1130 cm-1) and SMW (1650-2250 cm-1)
Anticipated GIFTS Level-0 data rate is 1.5 Terabytes per day
Introduction (2)
Satellite ground data processing will require considerable computing power to process data in real time
Cluster technologies employing a multi-processor system present the current economically viable option
A fault-tolerant real time data processing framework is proposed to provide a platform for encapsulating science algorithms for satellite data processing on Linux Clusters
Linux Clusters
Cost effective, performance to price ratio is much better than traditional parallel computers.
Consists of Commercial of the Shelf (COTS) products. Components may easily be replaced.
Runs a free software operating system such as Linux.
Can be customized to customer’s specific applications.
Linux Clusters (2)
Linux cluster from Linux Networx 18 dual AMD Opteron
desktop 240 1 GB DDR SDRAM per
CPU 120/40.8 GB hard drives SUSE Linux Enterprise
Server for AMD64 Myrinet and Gigabit
Ethernet connections
Linux Clusters (3)
Linux cluster capabilities: Performance Scalability High availability, Server failover support Comprehensive system monitoring
Track system health, predict computing trends, avoid computing bottlenecks.
Workload management Manage multiple users and applications - allowing for the efficient use of
system resources. Version controlled system image management
Send a system image from the host node to the rest of the cluster system. Allow system administrators to track upgrades and changes to the system
image. Fall back on older, known working version when necessary. Add or update system images within minutes regardless of number of
nodes.
Linux Clusters (4)
GIFTS Data Processing Framework Provide an operational platform within which
algorithm software pipelines can be deployed for Level-0 (un-calibrated interferograms) to Level-1 (calibrated radiances) data processing.
Hide the complexities involved with an operational cluster environment.
Separate the science algorithm layer from the cluster management layer.
Provide FAULT TOLERANCE.
Architectural Design
Provide task scheduling through a master process. Divide a job into parallel tasks. Assign tasks to worker processes. Retrieve algorithm specific parameters from a
database server. Retrieve data sets corresponding to tasks from a
data input server. Retrieve solutions corresponding to tasks on a
data output server. Provide fault tolerance for all crucial components
Architecture Design – System Reliability Active/standby redundancy for all servers
1:1 redundancy, a hot-standby unit is monitoring the active unit
Active unit saves its state in a checkpoint file In case of failure of active unit, standby unit takes over,
reading the last checkpoint file, and recreating server state Load sharing redundancy for workers
All worker units are active, carry work load in equal distributions
No standby unit In case a worker unit fails, the remaining units take over its
work load
GIFTS Data Processing Framework Implementation Implemented in ACE (Adaptive Communication Environment)
and C++ programming language. Current version contains source code for Master, Input, Output,
Worker and Reference Database servers, respectively. Provides algorithm independence through a set of base classes. Workers may be added and removed during run time. Works with Gigabit Ethernet and Myrinet. Provides easy server configuration.
hosts.conf misc.conf
Provides a complete set of APIs doc\html\index.html
GIFTS Data Processing Framework Implementation (2)– Server Failover High availability and server failover
Active/Standby servers Mirror file systems in real-time from active to standby system. Requires identical hardware
Performs checkpointing using BOOST Serialization Library in C++
Provides both XML and binary format Server starts from last checkpoint file
GIFTS Data Processing Framework Implementation (3) – Worker Failover and
Task Migration Master detects failed tasks, re-schedules them to
other workers Two queues are maintained for ‘new tasks’ and ‘assigned
tasks’ Task execution time is dynamically estimated based on
previous actual task execution times and an over-estimation factor
Tasks which are not completed within the estimated execution time are considered lost and will be re-scheduled
Master moves a lost task from the ‘assigned task’ queue to the head of ‘new task’ queue, so that it can be re-assigned to another worker
GIFTS Science Algorithms
Algorithm pipeline consisting of a set of modules [Knuteson 2004] Initial FFT Detector nonlinearity