Floating point to fixed point No hardware support for FP in embedded multi- core systems Provide a automated floating- to fixed-point conversion tool SIMD/SWP parallelization Loop parallelization and layout optimization for SIMD instructions Explore performance vs. accuracy trade-off in fixed-point encodings Optimized application code on multi-core platform Embedded application design Multi‐core hardware design Translation to Scilab & pragmas Abstract hardware description (ADL) KIT C-compiler Multi-core simulator Parameters for algorithm optimization C‐based code with parallel descriptions ALMA algorithm parallelization tools Executable binary (for simulator and HW) Recore C-compiler Structural hardware description Feedback for optimization ALgorithm parallelization for Multicore Architectures Faster time-to-market for embedded multicore systems with less application development effort HIDE THE COMPLEXITY BY THE ALMA T OOL FLOW WWW .ALMA -PROJECT .EU KEEP IT SIMPLE FOR THE PROGRAMMER The ALMA ToolFlow aims to Hide the complexity of the underlying hard-ware to the programmer provide a new approach for compiling annotated Scilab Code to MPSoC architectures Develop a unified SystemC simulation framework for MPSoCs Develop algorithms and tools for High-level, platform-independent application code performance estimation and optimization Identification of possible partitions and placing & routing on different underlying architectures Data Type binding and data-level parallelization ALMA Front-End Tools Scilab Front-End (SAFE) Parses Scilab source code and produces high level intermediate representation (HLIR) expressed in C ALMA profiler (aprof) Early performance estimation at the HLIR level High-Level Optimizer (HLO) Applies platform-independent optimizations to the HLIR Application Test Cases Coordinator: Contact: Budget: Start Date: Duration: Jürgen Becker (KIT) [email protected] 3,200,000 € 01/09/2011 36 Months Fine-Grain Parallelism Extraction Coarse-Grain Parallelism Extraction Responsible for global optimization Transformation of ALMA IR CFDG to Hierarchical Task Graph (HTG) High-level parallelization transformations to increase schedulable parallelism HTG partitioning to cores Optimal mapping and scheduling of tasks to architecture resources Iterative optimization by using task and communication profiling Parallel Code Generation Generates target-specific C code Maps Scilab variables to memory locations Expresses communication and SIMD instr. Instrumentation for profiling Uses Recore/Kahrisma C compiler Utilizing native MPI libraries Generates executable for the hardware and simulator Application Input Language (Scilab) ALMA dialect of the Scilab language Subset of Scilab language Extended by a preprocessing language Variables declaration Static types specification Maximum size of vector/matrix data type definition Extended by an annotation language for supporting parallelism extraction Architecture Description Language (ADL) Enables target independence of the toolchain Used as architecture description for the simulator Enables design-space exploration Compact specification of regular MPSoC structure Structural specification annotated with behavioural information Hierarchical module description for mixed- accuracy simulation support ADL Compiler Compile and analyse the architecture description Extracts high-level information from ADL description (e.g. number of cores, communication bandwidth, available memories) Flattens hierarchical description Multicore Architecture Test Cases Recore Systems’ Multi- core DSP Platforms KIT’s KAHRISMA Architecture Multicore Architecture Simulation Simulation of ALMA target architectures Retargetable Structure defined by ADL Implementation by library of SystemC modules Mixed-accuracy simulation Behavioural or cycle-accurate For individual modules Enables task and communication profiling This work is co‐funded by the European Union under the 7th Framework Programme under grant agreement ICT‐287733. Image Processing Object recognition and multi-object tracking Use of Scale Invariant Feature Transform (SIFT) Telecommunication IEEE 802.16e PHY Layer in NT x NR MIMO Configuration State-of-the-art WiMAX wireless communication The ALMA Tool Chain Annotated Scilab Code ADL ALMA IR ALMA IR Annotated C Code C Code + Back‐Annotation Binary Profile Information JSON Iterative Optimization Profile Information HLIR HLIR