. ................................................................................................................................................................................................................ RECONOS: AN OPERATING SYSTEM APPROACH FOR RECONFIGURABLE COMPUTING . ................................................................................................................................................................................................................ THE RECONOS OPERATING SYSTEM FOR RECONFIGURABLE COMPUTING OFFERS A UNIFIED MULTITHREADED PROGRAMMING MODEL AND OS SERVICES FOR THREADS EXECUTING IN SOFTWARE AND THREADS MAPPED TO RECONFIGURABLE HARDWARE.BY SEMANTICALLY INTEGRATING HARDWARE ACCELERATORS INTO A STANDARD OS ENVIRONMENT, RECONOS ALLOWS FOR RAPID DESIGN-SPACE EXPLORATION, SUPPORTS A STRUCTURED APPLICATION DEVELOPMENT PROCESS, AND IMPROVES THE PORTABILITY OF APPLICATIONS BETWEEN DIFFERENT RECONFIGURABLE COMPUTING SYSTEMS. ......Today’s high-density field- programmable gate arrays (FPGAs) allow for implementing very complex circuits. Still, reconfigurable computing applications are rarely mapped exclusively to the FPGA accel- erator. Application parts amenable to parallel execution, customization, and deep pipelin- ing are often implemented as custom hard- ware to improve performance or energy efficiency. Other parts, especially code that is highly sequential or difficult to implement as custom hardware, are executed in software mapped to a CPU. This decomposition of applications into separate, communicating parts that require synchronization among them is also widely used in pure software sys- tems in order to separate concerns and achieve concurrent or asynchronous process- ing. In software systems, the operating system (OS) standardizes these communication and synchronization mechanisms and provides abstractions for encapsulating the execution units (processes and threads), communica- tion, and synchronization. Reconfigurable computing systems still lack an established OS foundation that covers both software and hardware parts. Instead, communication and synchronization are usu- ally handled in a highly system- and applica- tion-specific way, which tends to be error prone, limit the designer’s productivity, and prevent portability of applications between different reconfigurable computing systems. The ReconOS operating system, pro- gramming model, and system architecture offers unified OS services for functions exe- cuting in software and hardware and a stand- ardized interface for integrating custom hardware accelerators. ReconOS leverages the well-established multithreading program- ming model and extends a host OS with hardware thread support. These extensions let the hardware threads interact with soft- ware threads using the same standardized OS Andreas Agne University of Paderborn Markus Happe Ariane Keller ETH Zu ¨rich Enno Lu ¨bbers Intel Labs Europe Bernhard Plattner ETH Zu ¨rich Marco Platzner Christian Plessl University of Paderborn ....................................................... 60 Published by the IEEE Computer Society 0272-1732/14/$31.00 c 2014 IEEE
12
Embed
RECONOS:AN OPERATING SYSTEM APPROACH …an operating system approach for reconfigurable computing the reconos operating system for reconfigurable computing offers a unified multithreaded
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
THE RECONOS OPERATING SYSTEM FOR RECONFIGURABLE COMPUTING OFFERS A UNIFIED
MULTITHREADED PROGRAMMING MODEL AND OS SERVICES FOR THREADS EXECUTING IN
SOFTWARE AND THREADS MAPPED TO RECONFIGURABLE HARDWARE. BY SEMANTICALLY
INTEGRATING HARDWARE ACCELERATORS INTO A STANDARD OS ENVIRONMENT,
RECONOS ALLOWS FOR RAPID DESIGN-SPACE EXPLORATION, SUPPORTS A STRUCTURED
APPLICATION DEVELOPMENT PROCESS, AND IMPROVES THE PORTABILITY OF APPLICATIONS
BETWEEN DIFFERENT RECONFIGURABLE COMPUTING SYSTEMS.
......Today’s high-density field-programmable gate arrays (FPGAs) allow forimplementing very complex circuits. Still,reconfigurable computing applications arerarely mapped exclusively to the FPGA accel-erator. Application parts amenable to parallelexecution, customization, and deep pipelin-ing are often implemented as custom hard-ware to improve performance or energyefficiency. Other parts, especially code that ishighly sequential or difficult to implement ascustom hardware, are executed in softwaremapped to a CPU. This decomposition ofapplications into separate, communicatingparts that require synchronization amongthem is also widely used in pure software sys-tems in order to separate concerns andachieve concurrent or asynchronous process-ing. In software systems, the operating system(OS) standardizes these communication andsynchronization mechanisms and providesabstractions for encapsulating the execution
units (processes and threads), communica-tion, and synchronization.
Reconfigurable computing systems stilllack an established OS foundation that coversboth software and hardware parts. Instead,communication and synchronization are usu-ally handled in a highly system- and applica-tion-specific way, which tends to be errorprone, limit the designer’s productivity, andprevent portability of applications betweendifferent reconfigurable computing systems.
The ReconOS operating system, pro-gramming model, and system architectureoffers unified OS services for functions exe-cuting in software and hardware and a stand-ardized interface for integrating customhardware accelerators. ReconOS leverages thewell-established multithreading program-ming model and extends a host OS withhardware thread support. These extensionslet the hardware threads interact with soft-ware threads using the same standardized OS
60 Published by the IEEE Computer Society 0272-1732/14/$31.00�c 2014 IEEE
mechanisms—for example, semaphores,mutexes, condition variables, and messagequeues. From the perspective of an applica-tion, it is thus completely transparentwhether a thread is executing in software orhardware. The availability of an OS layerproviding symmetry between software andhardware threads provides the following ben-efits for reconfigurable computing systems:
• The application development processcan be structured in a step-by-stepfashion with an all-in-software imple-mentation as a starting point. Per-formance-critical application partscan then be turned into hardwarethreads one by one to successivelyexplore the hardware/software designspace.
• The portability of applications be-tween different reconfigurable com-puting systems is improved by usingdefined OS interfaces for communica-tion and synchronization instead oflow-level platform-specific interfaces.
• The unified appearance of hardwareand software threads from the applica-tion’s perspective allows functions tomove between software and hardwareduring runtime, which supports thedesign of adaptive computing systemsthat exploit partial reconfiguration.
We discuss the evolution of operating sys-tems for reconfigurable computing and howReconOS relates to this heritage in the“Operating Systems for ReconfigurableComputing” sidebar.
Programming modelThe key idea of ReconOS is to extend the
multithreading programming model acrossthe hardware/software interface. In multi-threaded programming, applications are com-posed of objects such as threads, messagequeues, and semaphores, each of which has astrictly defined interface and purpose. Theapplication’s functionality is partitioned intothreads, which in our case can be either blocksof sequential software or parallel hardwaremodules. Threads communicate and syn-chronize using one or more of the program-ming model’s objects; for example, they can
pass data using message queues or mailboxes,explicitly coordinate execution through bar-riers or semaphores, or implicitly synchronizeaccess to shared resources by locking andunlocking mutually exclusive locks (mutexes).These objects and their interactions are widelyused in well-established APIs for program-ming multithreaded software applications. Amajor advantage that developers can drawfrom the ReconOS approach is that theseabstractions can be used not only for softwarethreads, but also for optimized hardwareimplementations of data-parallel functions—the hardware threads—without sacrificing theexpressiveness and portability of the applica-tion description.
Consider the example software threadsketched in Figure 1. The thread receivespackets streaming in via ingress mailboxmbox_in, processes them in a user-definedway, sends the processed packets to egressmailbox mbox_out, and updates a packetcounter stored in a shared variable protectedby the lock count_mutex. Using standardAPIs for message passing and synchroniza-tion, the software thread accesses OS servicesin an expressive, straightforward, and port-able way. As an additional benefit, such athread description manages to clearly sepa-rate thread-specific processing from OS calls.
Figure 2 shows a ReconOS hardwareimplementation of the same thread, parti-tioned into similar thread-specific logic andOS interactions. While the thread-specificuser logic contains the hardware thread’sdatapath and is limited only by availableFPGA resources, the OS interactions of ahardware thread are captured by the OS syn-chronization finite state machine (OSFSM).Together with the OS interface (OSIF), thisstate machine enables seamless OS calls fromwithin hardware modules. The developerspecifies the OSFSM using a standardVHDL state machine description, as shownin Figure 3. For accessing OS functions inthis state machine, ReconOS provides aVHDL library that wraps all OS calls withVHDL procedures. The OSFSM’s transi-tions are guarded by an OS-controlled sig-nal done (line 47), so that blocking OScalls—such as mutex_lock()—cantemporarily inhibit the execution of a hard-ware thread.
Consequently, the OSFSM in VHDLclosely mimics the sequence of OS callswithin the equivalent software thread: it readsa packet from a mailbox, passes it to a sepa-rate module to be processed, writes the proc-essed packet back to another mailbox, andincrements a thread-safe counter. Thedescription of the actual user logic, however,may well differ from the software realization,as this is the area where the fine-grained par-allel execution of an FPGA-optimized imple-mentation can realize its strengths—unhindered by the necessarily sequential exe-cution of OS calls.
ReconOS architectureThe ReconOS runtime system architec-
ture provides the structural foundation to
support the multithreading programmingmodel and its execution on CPU/FPGA plat-forms. Figure 4 shows a conceptual view of atypical system that is decomposed into theapplication software, OS kernel, and hard-ware architecture. The application’s softwarethreads are usually executed on the mainCPU alongside the host OS kernel thatencapsulates APIs, libraries, and all program-ming model objects, as well as lower-levelfunctions such as memory management anddevice drivers. The ReconOS runtime envi-ronment consists of hardware componentsthat provide interfaces, communicationchannels, and other functionality, such asmemory access and address translation to thehardware threads. Additionally, the runtimesystem comprises software components inthe form of libraries and kernel modules that
offer an interface to the hardware, the OS,and the application’s software threads.
A key component for multithreadingacross the hardware/software boundary is thedelegate thread, a lightweight software threadthat interfaces between the hardware threadand the OS. When a hardware thread needsto execute an OS function, it relays thisrequest through the OSIF to the delegatethread using platform-specific (but applica-tion-independent) communication interfa-ces. The delegate thread then executes thedesired OS functions on behalf of its associ-ated hardware thread. Hence, from the OSkernel’s point of view, only software threadsexist and interact, while the hardware threadsare completely hidden behind their respectivedelegate threads. From the application pro-grammer’s point of view, however, the
delegate threads are hidden by the ReconOSruntime environment, and only the applica-tion’s hardware and software threads exist.This delegate mechanism together with theunified thread interfaces gives ReconOSexceptional transparency regarding a thread’sexecution mode—that is, whether it runs insoftware or hardware. While the delegatemechanism causes a certain overhead for exe-cuting OS calls, the resulting simplicity ofswitching thread implementations betweensoftware and hardware greatly facilitates sys-tem generation and design space exploration.
The ReconOS concept is rather generaland has been ported to several FPGA families,main CPU architectures, and host operatingsystems (see the “ReconOS Versions andAvailability” sidebar). For the rest of thisarticle, we describe the implementation of
standard OS kernels, and a step-by-step design process starting
with a fully functional software prototype on a desktop.
References1. G. Brebner, “A Virtual Hardware Operating System for the
Xilinx XC6200,” Proc. Int’l Workshop Field-Programmable
Logic and Applications (FPL 96), LNCS 1142, 1996, pp.
327-336.
2. K. Compton et al., “Configuration Relocation and Defrag-
mentation for Reconfigurable Computing,” Proc. Int’l
within the OSFSM and access the OSIFthrough the two first-in, first-out (FIFO)buffers, i_soif and o_osif. Figure 5outlines the relationship between theOSFSM, the nested state machine imple-menting the mutex_lock procedure, andthe two FIFO buffers. Synchronizationbetween the nested state machines and theOSFSM is controlled via the handshakingsignal done. For communicating with thedelegate thread, we use a protocol that enco-des an OS request as a sequence of wordscomprising a function identifier and a call-specific number of parameters. The encodedrequest is written to the outgoing FIFOo_osif. For a hardware thread, a functioncall is completed when the delegate threadhas sent an acknowledgement and, option-ally, a return value has been read from theincoming FIFO i_osif.
Hardware threads reside in reconfigurableslots, which are predefined areas of reconfig-urable logic equipped with the necessarycommunication interfaces. Figure 6 shows aninstance of a ReconOS hardware architecturewith a CPU, two reconfigurable slots, the
memory subsystem, and various peripherals.Besides communicating with the OS kernelon the host CPU, hardware threads residingin reconfigurable slots can also access the sys-tem memory. To that end, a hardware threaduses its memory interface (MEMIF), shownin Figure 2, to connect to the ReconOSmemory subsystem. The memory subsystemarbitrates and aligns the hardware threads’memory requests and can handle single-wordas well as burst accesses. To support Linuxwith virtual addressing as host OS, ReconOSimplements a full-featured memory manage-ment unit (MMU), including a translationlook-aside buffer, which can autonomouslytranslate addresses using the Linux kernel’spage tables.1 Hardware threads use FIFObuffers to communicate with the memorysubsystem; one outgoing and one incomingFIFO buffer per hardware thread. Requestsfor memory transactions are encoded andwritten to the outgoing FIFO buffer, fol-lowed by data in the case of a write request.In the case of a read request, data becomeavailable on the incoming FIFO buffer uponcompletion of the memory transfer. Similarto the communication with the OS, we pro-vide a library of VHDL procedures to con-veniently handle memory operations. Theseprocedures encode the requests, synchronizewith the memory FIFO buffers, and auto-matically transfer data to and from localmemory elements within the hardwarethread.
Application development with ReconOSOver the years, ReconOS has been used to
implement several applications on hybridCPU/FPGA systems. These experiences haveconfirmed that the hybrid multithreadingapproach offered by ReconOS simplifies thedevelopment process, which is typically struc-tured in three steps. First, the developer pro-totypes the application’s functionality inmultithreaded software using, for example,the Pthreads library on Linux. This first soft-ware-based implementation allows for func-tional testing. Second, the multithreadedsoftware is ported to the embedded CPU onthe targeted platform FPGA, such as aMicroBlaze running Linux. The developercan then use profiling to identify the
SWthread
Ap
plic
atio
nso
ftw
are
OS
ker
nel
Har
dw
are
POSIX API
Scheduler Mutexes Semaphores
DriversDynamic memory management
Mainmemory
Peripherals
HWthread
OSIF OSIF
HWthread
...
Other libraries (networking,math, etc.)
SWthread
Delegatethread
Delegatethread
Figure 4. Conceptual overview of the ReconOS system architecture.
Software threads interact directly with the OS kernel, while hardware
threads connect through an OS interface (OSIF) and delegate threads.
application’s potential for parallel execu-tion—that is, those threads that could benefitfrom the fine-grained parallelism of a hard-ware realization, and those code segmentsthat are amenable to a coarser-grained paral-lel implementation with multiple threads.The third step includes creating the hardwarethreads and the ReconOS system architec-ture. At this point, ReconOS easily lets thedeveloper evaluate different mappings ofthreads to hardware and software and toquickly assess the overall performance on thetarget system.
ReconOS tool flowFigure 7 captures the ReconOS v3 tool
flow. The required sources comprise the soft-ware threads, the hardware threads, and the
specification of the ReconOS hardware archi-tecture. We code software threads in C andhardware threads in VHDL, using theReconOS-provided VHDL libraries for OScommunication and memory access. Anautomatic synthesis of hardware threads isnot part of the ReconOS project; developersare, however, free to use any hardwaredescription language or high-level synthesistool to create hardware threads. ReconOSextends the process for building a reconfigur-able system on a chip using standard vendortools. On the software side, the delegatethreads and device drivers for transparentcommunication with hardware threads arelinked into the application executable andthe kernel image, respectively. On the hard-ware side, components such as the OS and
memory interfaces, as well as support logicfor hardware threads, are integrated into thetool flow. The ReconOS System Builderassembles the base system design and thehardware threads into a reference design andautomatically connects bus interfaces, inter-rupts, and I/O. The build process then cre-ates an FPGA configuration bitstream for thereference design using conventional synthesisand implementation tools.
During design-space exploration, thedeveloper will create both hardware and soft-ware implementations for some of thethreads. Switching between these imple-mentations is a matter of replacing a singlethread instantiation statement—for example,using rthread_create() instead ofpthread_create(). Such a decision forsoftware or hardware can even be made dur-ing runtime (see the “Applications ofReconOS” sidebar).
Case study: Video object trackerTo illustrate the benefits of the ReconOS
approach, we present a particle-filter-basedvideo object tracker for continuous estima-tion of an object’s position and size in a videosequence.2 A particle filter is a robust techni-que for video object tracking because it main-tains several estimates (particles) for theposition and size of the tracked object. Thefilter iterates over video frames and processesthe particles in three consecutive stages:
1. Sampling estimates where the objectmight have been moved.
2. Importance weights all estimatedparticles by comparison with theobserved next video frame.
3. Resampling eliminates low-weightedparticles and duplicates high-weighted ones to create the particleset for the next filter iteration.
CPU
Softwarethread
ReconOSLinux
OSIF
Reconfigurable slot 0
Reconfigurable slot 1
Hardwarethread
MEMIF Arbiter
Memorysubsystem
MMU
Burstgenerator
MEMIFHardwarethread
OSFSM
OSFSM
OSIF
Delegatethread
Delegatethread
System bus
Memory ICAP EthernetOther
peripherals(USB, UART,...)
Figure 6. Example of a ReconOS hardware architecture with a CPU, two reconfigurable hardware slots, a memory
subsystem, and various peripherals. Hardware threads reside in reconfigurable hardware slots and can access the OS kernel
on the CPU via the OSIF and system memory via the MEMIF.
For our implementation, we start with anexisting video object tracker implemented inC.3 First, we transform the monolithic codeinto a multithreaded implementation on adesktop using Posix Pthreads under Linux.Each filter stage can be naturally turned intoa software thread, and the particles, groupedinto chunks, are forwarded between the filterstages via message boxes. Because the particlesare independent and thus can be processed inparallel, each stage is represented by multiplethread instances exploiting data parallelism.Second, we port our multithreaded softwareimplementation from the desktop to theCPU embedded in a Xilinx FPGA. Videodata is streamed from the desktop to theFPGA via Ethernet. Overall, this steprequires little effort because both platformsoffer the same OS and APIs. Third, we pro-file the execution times of all filter stages andconfirm that the execution times stronglydepend on the input data because the filtercomputes color histograms in variable-sizedregions of interest, in which the trackedobject is searched. We identify two functionsthat are typically performance-critical—colorhistogram computation (observation, o) and
color histogram comparison (importance,i)—and implement hardware thread versionsfor both functions.
Using the hardware threads for observa-tion and importance as well as the multi-threaded software implementation, weperform a swift design-space explorationmeasuring the required computational effortfor a given video sequence using hardware/software mappings with different resourcerequirements. Figure 8 shows the requiredcomputational effort in execution time perframe of various mappings for tracking a soc-cer player. The tracker that achieves the high-est performance is the one that employs fourhardware threads, two for observation andtwo for importance (mapping hwooii).Clearly, the required effort decreases whenthe object moves into the background. There,mapping hwi with a single hardware threadfor importance achieves comparable per-formance results.
A mong the existing OS approaches forreconfigurable computers, ReconOS
stands out by providing a deep semantic inte-gration of hardware accelerators into an OSenvironment while leveraging standard OSkernels. Hardware threads can access a richset of OS functions, making them essentiallyidentical to software threads with respect toOS interaction. Consequently, hardwarethreads can easily be exchanged for softwarethreads and vice versa, which allows for rapiddesign space exploration at design time andeven migration of functions across the hard-ware/software border at runtime. The use ofstandard OS kernels in ReconOS leads to astructured design process starting with a (pos-sibly monolithic) software implementation,as well as to improved portability. Our ex-perience shows that these features can sig-nificantly lower the entry barrier forreconfigurable computing technology. MICRO
AcknowledgmentsThis work was partially supported by the
German Research Foundation (DFG)within the Collaborative Research Centre“On-The-Fly Computing” (SFB 901), theInternational Graduate School of DynamicIntelligent Systems, and the European
05
1015202530354045
0 50 100 150 200 250 300 350 400
Mill
ion
cloc
k cy
cles
/fram
e
Frame
swhwo
hwoohwihwiihwoi
hwooihwooii
Figure 8. Design-space exploration for a video object tracker: The graph
shows the computational effort for tracking versus time in video frames for
a specific video (taken from Hess3). The individual curves represent
ReconOS implementations with different hardware/software mappings,
where “sw” denotes an all-in-software system, and curves labeled “hw”
denote systems with one to four threads of type observation (o) and
importance (i) running in reconfigurable hardware.
Union Seventh Framework Programmeunder grant agreement 257906 (EPiCS).
....................................................................References1. A. Agne, M. Platzner, and E. Lubbers,
“Memory Virtualization for Multithreaded
Reconfigurable Hardware,” Proc. Int’l Conf.
Field Programmable Logic and Applications
(FPL 11), 2011, pp. 185-188.
2. M. Happe, E. Lubbers, and M. Platzner, “A
Self-Adaptive Heterogeneous Multi-core
Architecture for Embedded Real-Time Video
Object Tracking,” J. Real-Time Image Proc-
essing, vol. 8, no. 1, 2013, pp. 95-110.
3. R. Hess, “Particle Filter Object Tracking,”
blog, May 2013, http://blogs.oregonstate.
edu/hess/code/particles.
Andreas Agne is a PhD student in the Com-puter Engineering Group at the Universityof Paderborn. His research interests includereconfigurable computing and operatingsystems for heterogeneous multicore archi-tectures. Agne has a Diploma in computerscience from the University of Paderborn.
Markus Happe is a senior researcher at theCommunication Systems Group at ETHZurich. His research interests include net-working architectures, self-adaptation strat-egies, and reconfigurable systems. Happehas a PhD in computer science from theUniversity of Paderborn.
Ariane Keller is a PhD student in the Com-munication Systems Group at ETH Zurich.Her research interests include computerarchitectures for self-organizing networks.Keller has a Diploma in electrical engineer-ing from ETH Zurich.
Enno Lubbers is a senior researcher at theIntel Open Lab in Munich, which is part of
Intel Labs Europe. His research interestsinclude adaptive systems and heterogeneousarchitectures for high-performance, em-bedded, and safety-critical applications.Lubbers has a PhD in computer engineeringfrom the University of Paderborn.
Bernhard Plattner is a full professor ofcomputer engineering in the Department ofInformation Technology and ElectricalEngineering at ETH Zurich, where he leadsthe Communication Systems Group. Hisresearch interests include self-organizingnetworks, mobile and opportunistic net-working, and practical aspects of infor-mation security. Plattner has a PhD incomputer engineering from ETH Zurich.
Marco Platzner is professor of computerengineering in the Department of ComputerScience at the University of Paderborn.His research interests include reconfigurablecomputing, hardware-software codesign, andparallel architectures. Platzner has a PhD in tel-ematics from Graz University of Technology.
Christian Plessl is assistant professor ofcustom computing in the Department ofComputer Science at the University of Pader-born. His research interests include paralleland reconfigurable computer architectures,high-performance computing, and adaptivecomputing systems. Plessl has a PhD in com-puter engineering from ETH Zurich.
Direct questions and comments about thisarticle to Christian Plessl, University of Pader-born, Department of Computer Science,Warburger Str. 100, 33098 Paderborn,Germany; [email protected].