Accelerating Your Success™ V10_1_2_0 Avnet Speedway Design Workshop ™ Lecture 2: System Prototyping with the Avnet Spartan-3A DSP FPGA DaVinci Development Kit
Accelerating Your Success™
V10_1_2_0
Avnet SpeedwayDesign Workshop™
Lecture 2: System Prototyping with the Avnet Spartan-3A DSP FPGA DaVinci Development Kit
Avnet SpeedWay Workshops
2
Avnet SpeedWay Design Workshop™2
Develop Executable Spec in Simulink
Partition Between DSP and FPGA Co-Processor
Model-Based Design Flow
Design Exploration for Targeting Hardware
Verify Hardware in HW Co-simulation
Implement Stand-Alone Video System
Avnet SpeedWay Design Workshop™3
The Problem We Wish to Solve
High level behavioral models are great for expressing ideas and prototyping quickly. Models also provide an executable specification or reference design that can be used for verification.
However, as we move closer towards implementation of our model, we need to elaborate it with details about the target hardware architecture.
Avnet SpeedWay Workshops
4
Avnet SpeedWay Design Workshop™4
Agenda
• Overview of TI DSP devices and design flow
• Design exploration for targeting hardware
• Overview of Real-Time Workshop Embedded Coder c-code generation
• Integrating with the TI DSP design flow
Avnet SpeedWay Workshops
5
Avnet SpeedWay Design Workshop™5
Agenda
• Overview of TI DSP devices and design flow
• Design exploration for targeting hardware
• Overview of Real-Time Workshop Embedded Coder c-code generation
• Integrating with the TI DSP design flow
Avnet SpeedWay Workshops
6
Avnet SpeedWay Design Workshop™6
Generate Ve
rify
Generate Ve
rify
Tools Overview
Code Composer Studio
TITITI
.
C & ASM
ISE
Hardware Hardware CoCo--simulationsimulation
Software Software CoCo--simulationsimulation
Avnet Spartan3A-DSP DaVinci Development Kit
DaVinci DM6437 Spartan®-3A DSP 3SD1800A
HDLTITI
XilinxXilinxXilinx
AvnetAvnetAvnet
MATLAB® Embedded MATLAB ToolboxesToolboxes
Simulink® Embedded MATLAB & C BlocksetsBlocksets
RealReal--Time WorkshopTime WorkshopEmbedded Coder,Embedded Coder,
IDE Link CC, Target TC6IDE Link CC, Target TC6
MathWorksMathWorksMathWorks
Avnet SpeedWay Workshops
77
Avnet SpeedWay Design Workshop™7
Application processing
OMAP3503
Low power processing Video processing
OMAP3530OMAP3525
DM355“DM3xx
Next”
DM644x
DM647DM648
DM6467“DM64xx
Next”DM6437DM643x
C674x
C640x
C550x
OMAP-L1
OMAP3515 DM335
Which TI device is best for me?
13 Different Products & Suites of Products shown, including many products for video. Including…
Applications processing with OMAP35xxHighest performance ARM + GraphicsFirst to market with Cortex-A8Up to 600MHz ARM Cortex-A8 (~ 1200 ARM9 MIPS)Up to 10 million polygons/ second with Graphics Accelerator
DM355:Low Price for HD, $10-$15 Range depending on volumeMPEG4 HD video, JPEGUp to 270 ARM9 MHz
DM644x:Up to 720p video decodeUp to 600 MHz C64x+ DSP + video accelerator performance4 10bit video DAC’s supporting composite, component, or S-Video
Avnet SpeedWay Workshops
8
Avnet SpeedWay Design Workshop™8
720
480
1080
DM64x™ DM644X
DM644X• H.264, MPEG2,
MPEG4, VC1• OSD capable
DM6437DM643X
DM643X (Lower Cost)• H.264 enc or dec• MPEG2 dec• MPEG4 enc or dec• VC1 dec
DM647/8Multi-SD
DM647/8 (Multi-Channel)• H.264 BP, MPEG2,
MPEG4• Multi-video interface• VC1 dec
DM6467 (HD)• H.264 HP, MPEG-4, VC1, MPEG2• Multi-SD enc & dec• 1080p 30fps dec, 720p enc or decFuture
DeviceProduction
In Development
Sampling
DM6467HD
• MPEG4 720p enc or dec• H.264 MP VGA decode• H.264BP/VC1/ WMV9 D1 enc or dec
OMAP3530OMAP3525
65nm
OMAP3530OMAP3525
65nm
• MPEG4 720p enc or dec
DM35590nm
DM35590nm
DM355 OMAP3530/3525
“DM64xxNEXT”
“DM3xxNEXT”
TI Video device capabilities
Slide 8
RS2 change this green bubble to purple. Label as "DM64xx Next"
Remove the red text under the bubbleRita Sulma, 25/08/2008
Avnet SpeedWay Design Workshop™9
Code Composer Studio
Avnet SpeedWay Workshops
10
Avnet SpeedWay Design Workshop™10
TI Code Composer Studio™ IDE
• Project Manager• Editor• Disassembly• Memory Registers• RTA: Extension Graph• Graphing: Eye Diagram, CPU
Load Graph• Message Log• Statistics• Watch Windows
Avnet SpeedWay Design Workshop™11
DSP/BIOS Concept Slide
• DSP/BIOS is a RTOS that provides run-time services which developers use to build DSP applications and manage application resources.
• The DSP/BIOS provides real-time, run-time kernel services that form the underlying architecture, or infrastructure, of real-time DSP applications.
• The DSP/BIOS kernel tightly integrates with the Code Composer Studio Integrated Developers Environment (IDE) to provide the ability to:– Select and configure the foundation modules and kernel objects
required by the application with the DSP/BIOS Configuration Tool– Provide DSP/BIOS kernel object viewing with the Code Composer
Studio (CCStudio) plug-in utility– Support the real-time analysis features in the DSP/BIOS kernel with
host-side tooling.
Avnet SpeedWay Workshops
11
Avnet SpeedWay Design Workshop™12
HOST DEVELOPMENT COMPUTER
Code Composer Studio r
TARGET TMS320 DSP HARDWARE
DSP/BIOS Real-time Analysis
Instrumented DSP application
executableimage
DEBUG
JTAGEMULATION
RTDX
kernel modules
CONFIGURATION
VISUALIZATION
BUILD
programsources kernel APIs
Graphical or script-based OS configurationEasily select only the modules requiredStatic creation of kernel data structures
Deterministic, multithreading kernelPreemptive schedulerDebug version builds-in instrumentation Scalable to minimal footprint
Graphical analysis & debug toolsExamine state of OS objectsReal-time capture of execution history, CPU load, & thread performance
DSP/BIOS OS & Tools
DSP/BIOS is a RTOS that provides run-time services which developers use to build DSP applications and manage application resources. The DSP/BIOS provides real-time, run-time kernel services that form the underlying architecture, or infrastructure, of real-time DSP applications.The DSP/BIOS kernel tightly integrates with the Code Composer Studio Integrated Developers Environment (IDE) to provide the ability to:
Select and configure the foundation modules and kernel objects required by the application with the DSP/BIOS Configuration ToolProvide DSP/BIOS kernel object viewing with the Code Composer Studio (CCStudio) plug-in utilitySupport the real-time analysis features in the DSP/BIOS kernel with host-side tooling.
DSP/BIOS includes interrupt dispatcher that can handle all interrupts coming into the deviceh d h h hl d bl d h f h
Avnet SpeedWay Workshops
12
Avnet SpeedWay Design Workshop™13
Interrupt Handling
• DSP/BIOS includes interrupt dispatcher that can handle all interrupts coming into the device– The dispatcher is highly optimized assembly code that
performs operations such as context save/restore and disabling/enabling preemption
• Interrupt handlers can be written in C• The dispatcher supports muxing of 64+ device
interrupt pins to multiple interrupt sources
Avnet SpeedWay Design Workshop™14
Real-time Analysis
•••
CPU Load
Message Logs
ThreadStatisticalInformation
Execution Graph (Software Logic Analyzer)
Avnet SpeedWay Design Workshop™15
CDB
KER
NEL O
BJEC
TS
GRAPHICAL CONFIGUATION
TEXTU
AL C
ON
FIGU
RA
TION
Avnet SpeedWay Design Workshop™16
Kernel Modules
Module DescriptionHWI Interface from hardware interrupts to kernel via dispatcher or macros
SWI Preemptible thread that uses program stack but cannot yield
TSK Independent, preemptible thread of execution that has its own stack and can yield the processor
PRD Time-triggered SWIMSGQ Variable-length transparent message passingMBX Mailboxes for synchronized fixed-sized data exchange between tasksLCK Nestable semaphore with concept of ownershipSEM Counting semaphore
Avnet SpeedWay Design Workshop™17
Kernel Modules
Module DescriptionQUE Atomic linked listsCLK Interface to hardware timersGIO Extensible I/O with support for asynchronous I/O & synchronous read/write
SIO Streaming I/OMEM Heap managerBUF Deterministic fixed-sized buffer allocation
Avnet SpeedWay Design Workshop™18
Real-time Analysis Modules
Module DescriptionLOG Low-overhead ‘printf’ or event logging to a buffer
STS Statistics such as # or times called, average execution time, and maximum execution time
HST Stream data to/from desktop host computer system
Avnet SpeedWay Workshops
19
Avnet SpeedWay Design Workshop™19
Agenda
• Overview of TI DSP devices and design flow
• Design exploration for targeting hardware
• Overview of Real-Time Workshop Embedded Coder c-code generation
• Integrating with the TI DSP design flow
Avnet SpeedWay Workshops
20
Avnet SpeedWay Design Workshop™20
Design Exploration for targeting hardware
• Convert from floating to fixed-point data types• Model the dataflow for your hardware:
– Patch / ROI processing for DSP– Line buffers for FPGAs– Streaming pixel processing for FPGAs
• Modeling data organization (row major vs column major)• Partition algorithm between DSP / FPGA• Use blocks that can create the code you want:
– Video and Image Processing Blockset -> C code– TI IMGLIB -> ASM code– Xilinx System Generator for DSP -> HDL code– Custom C / HDL code
Avnet SpeedWay Workshops
21
Avnet SpeedWay Design Workshop™21
Fixed-point design challenges
• Finite word lengths introduce quantization error– Overflow (overload distortion)
• Data beyond range of fixed-point data type– Underflow (granular noise)
• Not enough fractional bits for exact data match • Properly scale input, output, and intermediate quantities• Minimize error propagation of signals and parameters
s … 32 16 8 4 2 1 1/2 1/4 1/8 1/16 1/32 …
7+1=8 bit word length & 5 fractional bits Range =[-4 3.9688) Step = 1/32
7+1=8 bit word length & 1 fractional bit Range =[-64 63.5) Step =1/2
FPGA’s are inherently fixed-point machines. There are cores that are floating point capable, but it adds overhead. We instead choose “budget math” and need to contend with the above challenges to arrive at accurate results.
Avnet SpeedWay Design Workshop™22
Fixed-point design solutions
• Data type propagation• Port data type visualization
ufix16_E2un-signed fixed-point number16 bit word, 2 bit positive scaling
Range: [0 262140]Precision: 4
Signed | Integer | Word Length | Direction | Scale
Binary point scaling representation
Avnet SpeedWay Design Workshop™23
Demo: Fixed-Point Workflow
• Modeling fixed-point data types for bit-true simulation• Autoscaling to determine the optimum fractional settings
(scaling) for DSP word lengths
Avnet SpeedWay Design Workshop™24
Fixed-point design solutions
Set fixed-point data types for signals and blocks
Full manual control
Avnet SpeedWay Design Workshop™25
Fixed-point design solutions
Log Min, Max, and Overflow
Override data types with double precision
View fixed-point log and scaling
recommendations
Avnet SpeedWay Workshops
26
Avnet SpeedWay Design Workshop™26
Model the dataflow for your hardware
• Serial stream processing vs frame processing• Patch Processing for efficient data movement• Parallel Architecture• Pipelining• Row-Major vs Column-Major data organization
Insert a picture with Block Processing
Avnet SpeedWay Workshops
27
Avnet SpeedWay Design Workshop™27
• Typical embedded processors support two data types:– Base data type – integer of the specified bit size of an embedded
processor– Accumulator data type – integer that is twice the size of the base
data type supported by an embedded processor
• Base data type is supported for basic simulation operations such as addition, subtraction, multiplication, delay, and shift.
• Accumulation data type is supported only for operations such as addition, subtraction, and delay, not multiplication.
General Advice for Design Exploration
Targeting Embedded Processors
Typical embedded processors support two data types:
• Base data type—Integer of the specified bit size of an
embedded processor
• Accumulator data type—Integer that is twice the size of the
base data type supported by an embedded processor
Base data type is supported for basic simulation operations such as addition, subtraction, multiplication, delay, and shift. Accumulation data type is supported only for operations such as addition, subtraction, and delay, not multiplication.
Avnet SpeedWay Workshops
28
Avnet SpeedWay Design Workshop™28
General Advice for Design Exploration
• Multiplications must use the base data type.• Delays should use the base data type.
– Use of the accumulator data type is costly because they are stored in memory from one time step to the next.
– Delays usually feed to gains for multiplication.
• Temporary variables can use the accumulator data type.– They are stored temporarily in shared and reused memory like
RAM or CPU registers.
• Summations can use the accumulator data type.– To reduce buildup of errors due to round off.– To prevent overflows.
Fixed-Point Rules for Targeting Embedded Processors
The following is a set of guidelines for data type selection when targeting a fixed-point processor that supports a base data type
and an accumulator data type:
• Multiplications must use the base data type.
• Delays should use the base data type. Use of the accumulator data
type is costly because it is stored in memory from one time step
to the next. Besides, delays usually feed to gains for multiplication.
• Temporary variables can use the accumulator data type. They are
stored temporarily in shared and reused memory like RAM or
CPU registers.
• Summations can use the accumulator data type. This reduces the
buildup of errors due to round off and prevents overflows.
Avnet SpeedWay Workshops
29
Avnet SpeedWay Design Workshop™29
Agenda
• Overview of TI DSP devices and design flow
• Design exploration for targeting hardware
• Overview of Real-Time Workshop Embedded Coder c-code generation
• Integrating with the TI DSP design flow
Avnet SpeedWay Workshops
30
Avnet SpeedWay Design Workshop™30
Model-Based Design supports both Software and Hardware systems
• Coders– Code generation from models– Language options– Code interfacing, optimization
• Links– Verification tool integration– Project generation, build, download– Co-simulation, SIL/PIL/HIL
• Targets– Processor & memory specific optimization– Device drivers, board support– Schedulers, RTOS integration C / ASM
Verif
y
MCU DSP FPGA
VHDL / Verilog
Generate Ve
rify
Generate
MATLABMATLAB®® and Simulinkand Simulink®®
Algorithm and System Design
RealReal--Time WorkshopTime WorkshopEmbedded Coder,Embedded Coder,
IDE Link CC, Target TC6IDE Link CC, Target TC6
Code Generation
Avnet SpeedWay Workshops
31
Avnet SpeedWay Design Workshop™31
Real-Time Workshop® Embedded Coder
• Automatically generates C code from Simulink® models• Code is ANSI/ISO-C compliant, so it can run on any
microprocessor or real-time operating system (DSP/BIOS)• Concisely partitions multi-rate code for efficient scheduling with or
without an RTOS• Provides commenting capabilities to trace code to models and
requirements• Verifies code by importing it into Simulink for software-in-the-loop
testing• Generates optimized code:
– Automatic replacement of math functions and operators with target-specific implementations (TFL)
– Eliminate unnecessary initialization, termination, logging, and error-handling code
– Combine output/update functions to reduce code size– Remove floating-point code from integer-only applications
Avnet SpeedWay Design Workshop™32
Demo: Generating C-Code
Avnet SpeedWay Design Workshop™33
Real-Time Workshop Embedded Coder
Real-Time Workshop Embedded Coder uses target files to translate models into code that runs in a particular environment.
– You customize or use ready-to-run targets including:
– Optimized for floating-point code– Optimized for fixed-point code– Embedded Target products
You generate code for any processor by specifying integer word sizes and other required target characteristics or by choosing from a list of targets with predefined settings or by creating your own custom target.
Avnet SpeedWay Design Workshop™34
Real-Time Workshop Embedded Coder
Automatic replacement of math functions and operators with target-specific implementations (TFL)
Avnet SpeedWay Workshops
35
Avnet SpeedWay Design Workshop™35
Agenda
• Design exploration for targeting hardware
• Overview of TI DSP architecture and Design Flow
• Overview of Real-Time Workshop Embedded Coder C-Code Generation
• Integrating with the TI DSP Design Flow– Embedded IDE Link CC– Target Support Package TC6
Avnet SpeedWay Design Workshop™36
Connecting to TI Processors
Texas InstrumentsTexas InstrumentsCode Composer StudioCode Composer Studio
C & ASM Compile & Link Download
Debug
Verif
y
Real-Time WorkshopEmbedded Coder
RealReal--Time WorkshopTime WorkshopEmbedded CoderEmbedded Coder
IDE Link CCIDE Link CC
Gen
erat
e
Target TC6Target TC6
MATLAB & SimulinkMATLAB & Simulink
• Embedded IDE Link CC• Target Support Package TC6
Avnet SpeedWay Design Workshop™37
Embedded IDE Link Demo
DSP Implementation: Automatically Generate CCS projects using Real-Time Workshop Embedded Coder generated code
Two
PIL Verification: Processor-in-the-loop simulation to verify Simulink subsystems executing on target DSP with timing controlled by Simulink model
Three
Code Profiling: Real-time code profiling (stack and run time) with a graphical view and a detailed HTML report
Four
One Automation Interface: Use MATLAB scripts or Simulink models to automate verification and debugging tasks in CCS
Avnet SpeedWay Workshops
38
Avnet SpeedWay Design Workshop™38
Source Files
Generated Code Successfully Built within CCS
Processor Specific Interrupt Handler and Timer Code
DSP Implementation – Project Creation
Project Generation is the ability to create complete C projects from your Simulink models that can be built in CCS and executed on the DSP. Real-Time Workshop EC generated code can contain processor intrinsics, however, a typical DSP project contains, apart from the algorithm code, other processor specific files such as the scheduler or the Real-Time Operating Systems, memory map files, peripheral drivers etc. Link for CCS in conjunction with RTW-EC will generate some of these processor specific non-algorithmic files that is key to execute the code on the DSP.
Avnet SpeedWay Workshops
39
Avnet SpeedWay Design Workshop™39
Files Added in Project Creation
Created by Real-Time Workshop Embedded Coder• C and Header Files representing algorithms and
systems as Simulink models
Created by Link for Code Composer Studio• Memory Map – CMD files• Real-Time Scheduler and Timer Code• Interrupt Handling Code (with Hardware Interrupts
blocks)
Here’s a summary of all the files generated during the project generation.
Avnet SpeedWay Workshops
40
Avnet SpeedWay Design Workshop™40
Simulink Test Bench
TestSignals
CCS Test Bench
VerificationsAlgorithm
Link forCCS
Link forCCS
PILInterface
AlgorithmCode
P-I-L Verification
Fix Coloring
The next important functionality of this product combines both the project generation and automated verification features.
Consider a Simulink model that’s been tested in the simulation using test signals, and visualizing the output.
The next step is to implement and verify just algorithm subsystem on the target processor. RTW-EC can generate the algorithm code, but with the Link, one can also create the necessary test bench interface so the original Simulink model now acts as a test harness for the algorithm code.
We already saw this in our first demo that showed Lane Detection subsystem running on the DM642 DSP.. Let’s see another example, that of an ANC using an LMS filter, and this time, we’ll go through the steps to see how straightforward this process is.
If the code is compiled and tested in CCS once with optimization settings to “Register (-o0)” and again with optimization settings to “File (-o3)”, are both resulting output vectors guaranteed to be the same?
No – Optimization makes tradeoffs which may affect the results.If the code is compiled and tested using Microsoft Visual C++ and then using VisualDSP++, are the output vectors guaranteed to be the same?
No – In addition to differing optimization setting and schemes, the differing size of accumulators and overflow bits may produce different results for edge cases.Is a cycle-accurate Simulator always cycle accurate?
No – Most of the time they are but many customers call them “cycle approximate” simulators. Do you need to verify code developed and tested on a PC with code executing on the target embedded processor?
Yes!
Avnet SpeedWay Workshops
41
Avnet SpeedWay Design Workshop™41
Profiling real-time code execution on DSP
Code Profiling
• Uses DSP/BIOS statistics to measure execution time
• Profile Stack Usage
The code profiler in Target Support Package TC6 uses DSP/BIOS statistics objects to measure the execution time of code segments generated by individual subsystems. A code profile report helps you identify segments of generated code that are candidates for off-loading to an FPGA co-processor.
In depth technical information on code profiling is available at the following:
http://www.mathworks.com/access/helpdesk/help/toolbox/tic6000/index.html?/access/helpdesk/help/toolbox/tic6000/f8-7016.html
Avnet SpeedWay Workshops
42
Avnet SpeedWay Design Workshop™42
• Provides board support peripheral libraries• Added support for optimized C-intrinsics (C callable ASM
libraries)
Target Support Package TC6
Avnet SpeedWay Design Workshop™43
Lab #2