This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Objectives• Fault tolerance approaches used in previous missions• Radiation testing considerations for commercial
devices• State of the art in technology and planed roadmapTopics• Mass memory data recorders• Command and data handling• COTS-based onboard processors• Commercial device radiation testing considerations• Conclusions
Faster, better, cheaper – moving toward no longer just picking two!
Mission requirements increasing• Higher resolution data acquisition driving processing and storage requirements• Onboard processing and/or downlink often the system bottleneck• Increased need for autonomous functionality affecting system “overhead”
Design challenges also increasing• SWaP limitations on payloads not relaxing quickly• Flexible, multiuse payloads sought to limit NRE• “Radiation-hardened” components often not cost-effective for high-performance
applications• Commercial-Off-The-Shelf (COTS) parts often provide improved performance but
typically require mitigation to achieve the same level of fault tolerance
Achieving required level of fault tolerance often the most limiting factor in meeting mission objectives (after programmatic considerations)
Worldview RecorderLargest capacity data recorder flown to date at 2 TbitsProvides high-speed recording at 4 Gbps with 800 Mbps downlinkFault tolerance features• EDAC on a per chip basis• 2-n redundancy and cross
• Data downlink engine with CRC generator used to detect link errors• Bose-Chaudhuri-Hocquenghem (BCH) decoder verifies commands and data• Emergency downlink data collection and formatting (CCSDS/CADU format)
– If CPU unable to format downlink messages, critical data is gathered by the control card, formatted and down linked autonomously
Memory fault tolerance• EDAC protection on data• Full chip SDRAM failure detection and avoidance• Autonomous memory scrub routine enabled
System-level fault tolerance• Hardened processor• Read and write transactions return acknowledge bytes• Parity, stop, start, or timeout errors will trigger a retransmission• Up to 16 attempts to transmit before master gives up and notifies user logic• Watchdog timer / heartbeat monitor function on subsystems provides a backup
Application Independent Processor GoalsApplication Independent Processor (AIP) designed for Responsive Space
• Low cost• High Performance• Rapid deployment through adaptability• Designed for multiple missions including image processing, sat. comm., etc.
Key System Development Requirements• Scalable processing from 9 to over 400 GFLOPS• Reconfigurable, on-orbit• Support Terabit Data Storage• Usage of open standards• SEE Tolerant system• Flexible I/O architecture• Provide user interface for rapid development
The AIP first deployed as the processing core for Raytheon’s Advanced Responsive Tactically Effective Military Imaging Spectrometer (ARTEMIS)
Tactical operations with real-time downlink for command and controlARTEMIS is the imaging systemFirst incarnation of the AIPThe flexible AIP is being deployed for other types of missions
Need for performance and SEE fault tolerance• High-end processors, memory, interconnect technology, and other components
required to meet performance targets• Functional upsets directly affect the system’s availability• Therefore, components and mitigation strategies must be correctly and fully
characterized to ensure mission successLessons learned from SEE experiments
• Test methodology must center on application and not be a generic study• Initial tests serve to screen hardware moving toward full capability• When possible use exact system configuration to obtain meaningful results
– All components engaged in test and included in SEFI analysis– Design tests around the mission when possible– Tests performed at speed
• Build in as much visibility into the system as possible to observe SEFIs– Component complexity can make pinpointing the cause of SEFIs difficult– Numerous interacting components can mask SEFIs (latent)– Software is just as, if not more, complex than the underlying hardware
Commercial grade components for initial tests• Reduced costs• Expedited development activities• Cooling challenges often limit test facility options
– Difficult to cool a >10W device in vacuum
Commercial test support equipment• Procure off-the-shelf devices when possible• Built-in features often more than adequate
– Minimal need to develop equipment and test software for commercial devices
Commercial development tools and analysis techniques• Hardware interface equipment and software analysis tools instrumental• Heavy-ion tests found to be the most productive for mitigation development• Post-test analysis scripts greatly aided in determining SEFI causes• Once analysis infrastructure developed, quick turn around in development• “Beam range test” and script proved invaluable for verifying device thickness
Application-oriented test methodology with three distinct goals1. Determine baseline susceptibility of registers, memory and other components
• Limiting cross section created to aid in bounding development activities2. Determine baseline SEFI cross section for chosen application
• Real stimulus data with tests performed at speed• Focus on gathering data and trace files to help mitigation development
3. Undertake iterative mitigation development process to minimize cross section• For each prominent observed or expected SEFI a range of mitigation
approaches typically developed and retested• Successful approaches retained and additional data helped to guide the next
round of development• Process stops when limiting cross section can not be improved (e.g. below
uncorrectable MBU) or subcomponent geometric cross section sufficiently small
Observed hundreds of individual SEFI mode typically characterized• Some correctable, some understood but not correctable, and some unknown• Focus on reducing the “long poles in the tent”• Uncorrectable and unknown failures typically handled at the system level
Config 1 includes one OS and Configs 2 and 3 include a different OSNetwork traffic tested includes Ethernet-only and full stack up to TCPBest fault tolerance with Config 2 but improved performance with Config 3
Effective LET (MeVcm 2̂/mg)
SEFI
Cro
ss S
ectio
n (c
m^2
/dev
ice)
Configuration 1 -- Ethernet Only Configuration 1 -- Full Stack
Configuration 2 -- Full Stack Configuration 3 -- Full Stack