Top Banner

of 39

Challenges in High-Performance Embedded Designs FINAL

Apr 09, 2018

Download

Documents

David Fong
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/8/2019 Challenges in High-Performance Embedded Designs FINAL

    1/391

    -Embedded Designs

    Alec Bath, Application Engineer, STMicroelectronics

    Markus Mayr, Product Marketing Engineer, STMicroelectronics

  • 8/8/2019 Challenges in High-Performance Embedded Designs FINAL

    2/392

    1D Barcode Scanner

    Our current 1D Barcode Scanner is great - but:

    We need to reduce cost to stay competitive

    We need to add new features to react to customerrequests: USB, RTOS, etc.

    2

  • 8/8/2019 Challenges in High-Performance Embedded Designs FINAL

    3/393

    Original 1D Barcode Scanner

    ARM7TDMI ARM7TDMI

    UART/ PS2 (GPIO)

    50MHz128KB Flash16KB RAM

    50MHz128KB Flash16KB RAM

    CCD Scan sensorCCD Scan sensorExternal Fast ADCExternal Fast ADC

    12

    User InterfaceUser Interface

    GPIO

    3

  • 8/8/2019 Challenges in High-Performance Embedded Designs FINAL

    4/394

    Von-Neumann Bottleneck

    DMADMA

    AHB1AHB1

    MasterMaster

    Slow PeripheralsSlow Peripherals User InterfaceUser Interface

    SRAM

    EMIEMI

    AHB2AHB2ARM7TDMI

    Master

    ARM7TDMI

    Master

    FLASHFLASH

    Fast PeripheralsFast PeripheralsUSB 1.1USB 1.1

    External Fast ADCExternal Fast ADC

    12

    4

    Up to128KBUp to128KB

    IRQ!!IRQ!!

  • 8/8/2019 Challenges in High-Performance Embedded Designs FINAL

    5/395

    1D Scanner design then and now

    BuBuSystem

    D-bus

    I-busCORTEX-M3Master 1

    72MHz128KB Flash20KB RAM

    CORTEX-M3Master 1

    72MHz128KB Flash20KB RAM

    FLASHFLASHI/F

    I/F

    Matrix

    Matrix

    GP-DMAMaster 2GP-DMAMaster 2

    SlaveSlave

    AHB-APB2AHB-APB2

    AHB-APB1AHB-APB1

    AHB

    GPIOA,B,C,D,E - AFIO USART1- SPI1 - ADC1,2 -

    TIM1 - EXTI

    GPIOA,B,C,D,E - AFIO USART1- SPI1 - ADC1,2 -

    TIM1 - EXTI

    Bridges

    APB1

    APB2

    Arbiter

    USART2,3 - SPI2 - I2C1,2 TIM2,3,4 - IWDG WWDG

    USB CAN BKP PWR

    USART2,3 - SPI2 - I2C1,2 TIM2,3,4 - IWDG WWDG

    USB CAN BKP PWR

    User InterfaceUser Interface

    USB 1.1USB 1.1

    5

  • 8/8/2019 Challenges in High-Performance Embedded Designs FINAL

    6/396

    Innovative System Architecture

    Harvard architecture + BusMatrix allows concurrent Flash execution and DMA transfer

    Advanced Peripherals to further offload the CPU

    Low-latency deterministic interrupt controller in the Cortex-M3 core

    75% lower power at the same clock speed as ARM7

    30% better code size via THUMB2 instruction set

    BusM

    atrix

    BusM

    atrix

    System

    D-bus

    I-bus

    SRAM

    Slave

    SRAM

    Slave

    FLASHFLASHI/F

    I/F

    --

    GPIOA,B,C,D,E - AFIO USART1- SPI1 - ADC1,2 -

    TIM1 - EXTI

    GPIOA,B,C,D,E - AFIO USART1- SPI1 - ADC1,2 -

    TIM1 - EXTI

    APB2

    CORTEX-M3Master 1

    72MHz128KB Flash20KB RAM

    CORTEX-M3Master 1

    72MHz128KB Flash20KB RAM

    6

    -Master 2

    -Master 2

    AHB-APB1AHB-APB1

    Bridges

    APB1

    Arbiter

    USART2,3 - SPI2 - I2C1,2 TIM2,3,4 - IWDG WWDG USB CAN BKP PWR

    USART2,3 - SPI2 - I2C1,2 TIM2,3,4 - IWDG WWDG USB CAN BKP PWR

  • 8/8/2019 Challenges in High-Performance Embedded Designs FINAL

    7/397

    Cortex-M3 Harvard Architecture

    Instructions fetched over I -bus

    System

    D-bus

    I-bus

    CORTEX-M3Master 1

    CORTEX-M3Master 1

    FLASHFLASH

    Flash

    I/F

    Slave

    Flash

    I/F

    Slave

    biter

    D-bus

    SRAMSlaveSRAMSlave

    AHB/ APBxSlave

    AHB/ APBxSlave

    AHB

    Bridges

    APBxUSART / SPI /

    I2C / ADC/ TIMUSART / SPI /

    I2C / ADC/ TIMMulti

    layerBusMatrix/A

    System

    GP-DMA

    GP-DMA

    7

  • 8/8/2019 Challenges in High-Performance Embedded Designs FINAL

    8/398

    Cortex-M3 Harvard Architecture

    Instructions fetched over I -bus

    System

    D-bus

    I-bus

    CORTEX-M3Master 1

    CORTEX-M3Master 1

    FLASHFLASH

    Flash

    I/F

    Slave

    Flash

    I/F

    Slave

    biter

    D-bus

    w e era s e c e on - us

    SRAMSlaveSRAMSlave

    AHB/ APBxSlave

    AHB/ APBxSlave

    AHB

    Bridges

    APBxUSART / SPI /

    I2C / ADC/ TIMUSART / SPI /

    I2C / ADC/ TIMMulti

    layerBusMatrix/A

    System

    GP-DMA

    GP-DMA

    8

  • 8/8/2019 Challenges in High-Performance Embedded Designs FINAL

    9/399

    Cortex-M3 Harvard Architecture

    Instructions fetched over I -bus

    System

    D-bus

    I-bus

    CORTEX-M3Master 1

    CORTEX-M3Master 1

    FLASHFLASH

    Flash

    I/F

    Slave

    Flash

    I/F

    Slave

    biter

    D-bus

    w e cons an s e c e on - us

    SRAMSlaveSRAMSlave

    AHB/ APBxSlave

    AHB/ APBxSlave

    AHB

    Bridges

    APBxUSART / SPI /

    I2C / ADC/ TIMUSART / SPI /

    I2C / ADC/ TIMMulti

    layerBusMatrix/A

    System

    GP-DMA

    GP-DMA

    While Core reads peripheral

    9

  • 8/8/2019 Challenges in High-Performance Embedded Designs FINAL

    10/3910

    DMA & Cortex-M3 Data Flow

    Instructions fetched over I -bus

    System

    D-bus

    I-bus

    CORTEX-M3Master 1

    CORTEX-M3Master 1

    FLASHFLASH

    Flash

    I/F

    Slave

    Flash

    I/F

    Slave

    biter

    D-bus

    w e era s e c e on - us

    SRAMSlaveSRAMSlave

    AHB/ APBxSlave

    AHB/ APBxSlave

    AHB

    Bridges

    APBxUSART / SPI /

    I2C / ADC/ TIMUSART / SPI /

    I2C / ADC/ TIMMulti

    layerBusMatrix/A

    System

    GP-DMA

    GP-DMA

    WhileDMAreadsSRAM!

    While Core reads peripheral

    10

  • 8/8/2019 Challenges in High-Performance Embedded Designs FINAL

    11/3911

    The Cortex-M3 MCU Core

    High performance with low dynamic power Harvard Architecture 30% erformance im rovement over ARM7TDMI Single-cycle multiply Hardware divide Atomic Bit manipulation

    Best code density Thumb-2 brings 32-bit performance with 16-bit code density

    Deterministic Interrupt controller inside the core, 12-cycle push / 12-cycle pop Just 6-cycle latency for tail-chained interrupts

    Improved debug features

    11

    Serial-Wire-Viewer adds real-time data trace ETM on select part numbers 2 data watchpoints, 6 hardware breakpoints

  • 8/8/2019 Challenges in High-Performance Embedded Designs FINAL

    12/3912

    Whats Thumb-2 ?

    Thumb-2 is a NEW ARM instruction set, mixing 16 & 32-bit instructions

    Backwards compatible to previous 16-bit THUMB instruction set

    12 new instructions, including several DSP-type instructions

    Memory footprint similar to THUMB, Performance similar to ARM!

    No more interworking between ARM and THUMB modes!

    12

  • 8/8/2019 Challenges in High-Performance Embedded Designs FINAL

    13/3913

    Compact Code and Data Memory

    Cortex-M3 supports unaligned data accesses to improve data constant and

    RAM utilization

    Dataaligned

    32bit machinewhich doesnot supportunaligned data

    long (32)

    int (16)

    char (8)

    long (32)

    int (16)c

    int (16)

    char (8) char (8) char (8)

    char (8)

    long (32)

    long

    int (16)

    char (8)

    long int (16)c

    int (16)

    long (32)

    char (8) char (8) char (8)

    char (8)

    long (32)

    long (32)

    long (32)

    long

    Structuremanagementexample

    Unused (wasted) space Free space for the rest of the application

    long (32)

    Reduces SRAM Memory Requirements By Over 25%

    Less Memor = Lower cost devices!

    13

  • 8/8/2019 Challenges in High-Performance Embedded Designs FINAL

    14/3914

    Atomic Bit Manipulation via Bit Banding

    14

  • 8/8/2019 Challenges in High-Performance Embedded Designs FINAL

    15/3915

    NVIC Interrupt Handling

    Interrupts are handled in hardware! Theres no instruction overhead

    -

    Processor state automatically saved to the stack over the data bus.{PC, xPSR, R0-R3, R12, LR}

    In parallel, ISR is prefetched on the instruction bus.-ISR ready to start executing as soon as stack PUSH complete.

    12-cycle Exit:

    Processor state is automatically restored from the stack.

    In parallel, interrupted instruction is prefetched ready for execution uponcompletion of stack POP.

    15

    Stack POP can be interrupted, allowing new ISR to be immediatelyexecuted without the overhead of state saving.

  • 8/8/2019 Challenges in High-Performance Embedded Designs FINAL

    16/3916

    Fast Interrupt Response and Tail Chaining

    Here we have 2 simultaneous interrupts,

    IRQ1 being of higher priority

    PUSH POPISR 1 PUSH POPISR 2

    26 16 26 16

    IRQ1

    IRQ2ARM7Interrupt handling inassembler code

    42 CYCLES

    HighestWhen IRQ1 is finished, IRQ2 is serviced

    PUSH ISR 1 POPISR 2

    12

    Cortex-M3Interrupt handling in HW

    6 12

    6 CYCLES

    Tail-chaining

    16

    In Cortex-M3, ISR2 has only a 6-cycle delay.ISR2 has been tail-chained

  • 8/8/2019 Challenges in High-Performance Embedded Designs FINAL

    17/3917

    System Timer (SysTick)

    Flexible system timer is part of the Cortex-M3 Core

    24-bit self-reloading down counter with end of count interrupt

    2 configurable Clock sources

    Suitable for Real Time OS or other scheduled tasks

    Can only access when executing in privilegedmode

    17

  • 8/8/2019 Challenges in High-Performance Embedded Designs FINAL

    18/3918

    STM32 On-Chip Flash Memory Interface

    Mission:: Support 72 MHz operation directly from Flash memory

    64-bits wide Flash with Pre-fetch (2 64bits buffers)

    Instructions-BUS

    FlashInterface

    ARBITERARBITER

    bits

    bits

    FLASHMEMORY

    bits

    64bits

    32 bits 16 bits16 bits

    321616Bits

    Thumb-2

    Data/Debug-BUS

    ARRAY

    6 6 64

    64bits

    Thumb-2 Thumb-2humb

    32bits

    Thumb-2

    3

    2bits

    Th

    umb-2

    CORTEX-M3

    CPU

    18

    32 bits

    Data

    16-bit

    Data

    8 bit

    Data

  • 8/8/2019 Challenges in High-Performance Embedded Designs FINAL

    19/3919

    Low Power Modes

    Functions Low power modes names and

    Low Power Modes

    CPU Periphshigh

    speedOsc

    Medium

    Speed

    OSC

    RTC

    CalendarLSI RAM

    STM32LCurrent consumption

    STM32LWake-up Time

    RUN (from Flash) ONCan beenabled

    ONCan beenabled

    ON ON ON 230A/MHz

    RUN (from RAM) ONCan beenabled

    ONCan beenabled

    ON ON ON 185A/MHz

    LP RUN ONan e

    enabledan e

    enabledON (LS) ON ON ON 11A -

    LP SLEEP OFFCan beenabled

    OFF OFF ON ON ON 6A 0.35s

    STOP w/full RTC

    ON

    ON ON ON 1.3A 8s

    STOP w/o RTC OFF ON ON 0.43A 57s

    STANDBY w/full RTC ON OFF OFF 1A 8s

    19

    STANDBY w/o RTC OFF OFF OFF 0.27A 57s

  • 8/8/2019 Challenges in High-Performance Embedded Designs FINAL

    20/3920

    MCU Platform for Rapid Innovation

    Powerful core

    Fully compatible product portfolio

    Complete ecosystem

    Connectivity Line Performance Line Access LineValue LineUSB Access Line Low Power w LCDLow Power

  • 8/8/2019 Challenges in High-Performance Embedded Designs FINAL

    21/3921

    2D Barcode Scanner

    21

    Easy and fast product transition thanks to scalablearchitecture and compatible product family

  • 8/8/2019 Challenges in High-Performance Embedded Designs FINAL

    22/39

  • 8/8/2019 Challenges in High-Performance Embedded Designs FINAL

    23/3923

    Innovative System Architecture

    CORTEX-M3120MHz w/ MPU

    CORTEX-M3120MHz w/ MPU

    High SpeedUSB2.0

    High SpeedUSB2.0

    Ethernet10/100

    Ethernet10/100

    Dual PortDMA2

    Dual PortDMA2

    Dual PortDMA1

    Dual PortDMA1

    riph1

    Master 1Master 1

    I-Bus

    D-Bus

    S-Bus

    as eras er

    FIFO/DMAFIFO/DMA

    as eras er

    FIFO/DMAFIFO/DMA

    FIFO/8 StreamsFIFO/8 Streams

    AHB1AHB1

    Dual PortAHB1-APB2Dual Port

    AHB1-APB2

    as eras er

    FIFO/8 StreamsFIFO/8 Streams

    Mem1

    Mem2

    Periph2

    Fast PeripheralsFast Peripherals

    GPIOsGPIOs

    Dual PortAHB1-APB1Dual Port

    AHB1-APB1Slow PeripheralsSlow Peripherals

    Pe

    SRAM1112KBSRAM1112KB

    SRAM216KB

    SRAM216KB

    FSMCFSMC

    AHB2AHB2 , rypto,USB Full Speed

    , rypto,USB Full Speed

    23

    Multi-AHB Bus Matrix

    ARTAccelerator

    ARTAccelerator

    I-Code

    D-CodeFLASHUp to

    1Mbytes

    FLASHUp to

    1Mbytes

  • 8/8/2019 Challenges in High-Performance Embedded Designs FINAL

    24/3924

    2D Barcode Scanner

    HS USB DeviceGPIO

    USB connector

    Camera I/F

    User Interface USART connector

    USART

    802.15.4 Radio

    HIDClass

    DAC / I2S

    Image Processing(DSP Calculations)

    24

  • 8/8/2019 Challenges in High-Performance Embedded Designs FINAL

    25/39

  • 8/8/2019 Challenges in High-Performance Embedded Designs FINAL

    26/39

  • 8/8/2019 Challenges in High-Performance Embedded Designs FINAL

    27/3927

    Audio Docking Station

    Cortex-M3

    CPU

    MicroPhone &

    Pre-amplifier

    12-bits ADC

    Flash

    SRAM

    DMA

    MP3 Decoder ( 2 instances )MP3 Encoder

    Volume/ Ch Mixer

    Loudness Control

    AudioDAC

    Amp

    FSMC

    PLL block

    XTAL oscillators32 kHz + 3-25 MHz

    Speakers / Headset

    USB

    IS

    SDIO

    eco er

    5 bands - Equalizer

    MSClass *

    File System *

    FS USB Host *

    HMI Control& Display

    27

    Audio Media:

    USB mass storage device

    TouchScreenQVGALCD

    USBKey

  • 8/8/2019 Challenges in High-Performance Embedded Designs FINAL

    28/3928

    Dual Port

    DMA1

    Dual Port

    DMA1 Dual PortDual Port Slow PeripheralsSlow Peripherals

    Architecture : DMA & Multi-Bus Matrix

    Master 2Master 2

    AHB1AHB1

    Dual PortAHB1-APB2Dual Port

    AHB1-APB2Dual Port

    DMA2Master 3

    Dual PortDMA2

    Master 3

    Fast PeripheralsFast Peripherals

    GPIOsGPIOs

    AHB1-APB1AHB1-APB1

    ast erp era usto bypass the bus matrix

    SRAM1112KBSRAM1112KB

    SRAM216KB

    SRAM216KB

    FSMCFSMC

    AHB2AHB2g peeUSB2.0Master 4

    g peeUSB2.0Master 4

    Ethernet10/100Master 5

    Ethernet10/100Master 5

    MI, rypto,

    USB Full Speed

    MI, rypto,

    USB Full Speed

    28

    ARTAccelerator

    ARTAccelerator

    CORTEX-M3120MHz w/ MPU

    Master 1

    CORTEX-M3120MHz w/ MPU

    Master 1

    I-Code

    D-Code

    FLASHUp to

    1Mbytes

    FLASHUp to

    1Mbytes

    Multi-AHB Bus Matrix

  • 8/8/2019 Challenges in High-Performance Embedded Designs FINAL

    29/39

  • 8/8/2019 Challenges in High-Performance Embedded Designs FINAL

    30/3930

    ARTAcceleratorTM: the Bottom line !

    80

    100

    120

    140

    160

    MIPS)

    STM32F200

    MCU A

    MCU B

    STM32F200performance

    0

    20

    40

    60

    0 50 100 150erfo

    rmance(

    Impact of wait states:

    is almost linearwith frequency

    30

    P Core Frequency - mper ec acce era or-Slow flash

  • 8/8/2019 Challenges in High-Performance Embedded Designs FINAL

    31/39

  • 8/8/2019 Challenges in High-Performance Embedded Designs FINAL

    32/3932

    What is CoreMark?

    Simple, yet sophisticated

    ,

    Comprehensive documentation and run rules

    Free, but not cheap

    Open C code source download from EEMBC website

    Dhrystone Terminator

    The benefits of Dhrystone without all the shortcomings

    Free, small, easily portable

    CoreMark does real work

  • 8/8/2019 Challenges in High-Performance Embedded Designs FINAL

    33/3933

    Exposing Dhrystone Weaknesses

    Major portions of Dhrystone are susceptible to a compilers

    -

    Library calls are made within the timed portion and dominate

    the time consumed by the benchmark - NOT CoreMark

    Completely synthetic and does not mimic any behavior thatcan be expected in a real application- NOT CoreMark

    No official source code resulting in different, and oftenundisclosed, versions (1.1, 2.0, 2.1) - NOT CoreMark

    Very vague and ambiguous run guidelines are not universally

    known and are not enforced - NOT CoreMark

    (DMIPS, Dhrystones per second, DMIPS/MHz) - NOTCoreMark

  • 8/8/2019 Challenges in High-Performance Embedded Designs FINAL

    34/3934

    CoreMark Workload Features

    Matrix manipulation allows the use of MAC and common math ops

    State machine operation represents data dependent branches

    Cyclic Redundancy Check (CRC) is very common embeddedfunction

    est ng or:

    A processors basic pipeline structure

    Basic read/write operations

    Integer operations

    Control operations

  • 8/8/2019 Challenges in High-Performance Embedded Designs FINAL

    35/3935

    Summary: The Value of CoreMark

    Simple to use, yet sufficiently sophisticated forenc mar ng a processor core

    Freely available, limited usage restrictions

    Provides industry standard tool to allow users tobe in embedded rocessor anal sis

    Introduces all processor users to the overallvalue of EEMBC

  • 8/8/2019 Challenges in High-Performance Embedded Designs FINAL

    36/3936

    EEMBC CoreMark 1.0 - Summary250

    STM32F2xx(228.6@ 120MHz

    CoreMark

    [Iter/Sec]

    STM32F2xx

    150

    200

    (190.30@ 100MHz)

    50

    100

    36

    0

    MHz

  • 8/8/2019 Challenges in High-Performance Embedded Designs FINAL

    37/39

  • 8/8/2019 Challenges in High-Performance Embedded Designs FINAL

    38/3938

    Lets review!

    Not all Cortex-M3 microcontrollers are the same!

    ngs to cons er n your next g -per ormanceembedded design:

    An innovative multi-layer bus architecture

    -

    Highly optimized flash memory controller

    The efficient Cortex-M3 core

    38

    A cutting-edge 90nm process technology, enabling

    120MHz performance with very low power consumption

  • 8/8/2019 Challenges in High-Performance Embedded Designs FINAL

    39/39