Top Banner
Design Issues for Memory Architecture in Embedded Systems Chia-Lin Yang Dept. of Computer Science and Information Engineering National Taiwan University
23

Design Issues for Memory Architecture in Embedded Systems

Feb 03, 2016

Download

Documents

Design Issues for Memory Architecture in Embedded Systems. Chia-Lin Yang Dept. of Computer Science and Information Engineering National Taiwan University. Memory Hierarchy in an Embedded System. DRAM. DRAM. Flash-Based Storage. Flash-Based Storage. CPU. CPU. DSP. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • Design Issues for Memory Architecture in Embedded SystemsChia-Lin YangDept. of Computer Science and Information EngineeringNational Taiwan University

  • Memory Hierarchy in an Embedded SystemOn-chip Interconnection NetworkCPU

    I-CacheD-CacheDigital Signal Processing (DSP)

    Local MemoryIP Core

    Local MemoryShared MemoryDRAM

    Flash-Based Storage

    FlashControllerMulti-banking flash memory chipsMemoryControllerMemoryBanksOn-chip Interconnection NetworkCPU

    I-$D-$DSP

    Memory or $IP Core

    Private MemoryShared MemoryDRAM

    Flash-Based Storage

    FlashControllerMulti-banking flash memory chipsMemoryController

    MemoryBanksIP CoreDMADMA

  • On-Chip Memory Design Issue Memory/Data allocationDecide the configuration of on-chip memory architectureWhich data are allocated to on-chip memories Consider data access frequency and life timeHow much on-chip memory resources are required? How many DMAs are sufficient?Private or shared memory?Private memory : lower contention, lower storage efficiency due to duplicated copyShared memory: higher contention, higher storage efficiency

  • On-Chip Memory Issue (contd) Low-leakage memory State destroying technique State preserving technique

    When to turn a memory line into low-leakage modesHardware-Controlled CachePeriodically turn off a cache line, orTurn off a cache line when it is not accessed for a period of timeSoftware-managed addressable memory Data lifetime analysis at compile time

    word line driverrow decoderdrowsy signaldrowsy (set)wake-up (reset)1V0.3Vdrowsydrowsypower lineword lineSRAMsWordline gatedrowsy bitvoltage controller

  • DRAM Management On-chip Interconnection NetworkGeneral Purpose Processor

    I-CacheD-CacheDigital Signal Processing (DSP)

    Local MemoryIP Core

    Local MemoryShared MemoryDRAM

    Flash-Based Storage

    FlashControllerMulti-banking flash memory chipsMemoryControllerMemoryBanksOn-chip Interconnection NetworkCPU

    I-$D-$DSP

    Memory or $IP Core

    Private MemoryShared MemoryDRAM

    Flash-Based Storage

    FlashControllerMulti-banking flash memory chipsMemoryController

    MemoryBanksIP CoreDMA

  • Memory Controller Design IssueChallenges in the memory controller design in MPSOCConcurrent main memory accesses with different access patternsMultiple streams, random accesses, etc Limitations on conventional DRAM controllerUnaware of DRAM statusLack of control over the bandwidth allocation for different PEsSignificant access latencies due to the fair scheduling policies

    Memory ControllerRequest BuffersMemory Access Scheduler

  • Smart Memory Controller DesignStream prefetching Identify streams at runtime or compile-time, and perform stream prefetching Address pre-computationMultimedia processing units usually have regular address patterns1-D linear address generator: audio codec2-D block-based address generator: mpeg2 motion compensation, DCT

  • Smart Memory Controller Design (contd) Row buffer management Close page policyPrecharge as soon as possibleGood for random accessesOpen page policyPrecharge as late as possible Good for accesses with high localityClose page or open page policy?Different access patterns within & among tasks=>Dynamic row buffer management

  • Smart Memory Controller Design (contd)Memory access schedulingSchedule accesses to different banks at the same timeProvide high utilization of DRAM bandwidthSchedule accesses according to the state of DRAM

    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17PCCDRAM Operations:

    P: bank precharge (3 cycle occupancy)A: row activation (3 cycle occupancy)C: column access (1 cycle occupancy) With access scheduling (17 DRAM cycles)

  • Smart Memory Controller Design (contd)Task-aware scheduling policySchedule the accesses of a task togetherMemory contentionTask-aware schedulingtimetimetimeMemoryaccesscomputationinterferencetask1task2

  • Power Management in DRAMsDDR/DDR2 power managementFour different power modesActive standby, active powerdown, precharge standby, precharge powerdownState transition eventsCKE: clock enable signalCKE must be high to serve requestsSense amplifier w/wo data

    Active standby0.2x mWPrecharge standby0.18x mWPrecharge powerdown0.04x mWActive powerdown0.11x mWCKE lowCKE highCKE lowCKE highData in sense amplifiersData not in sense amplifiersData not in sense amplifiersDDR power-statesActive1.0x mW

  • DRAM Power Reduction Strategy Open page vs. close pageRow buffer hit Open page is more energy-efficient than close pageReduce row access powerRow buffer miss Open page is less energy-efficient than close pageWasted energy due to staying in the active standby mode before next row accessClose page or open page policy?Different access patterns within & among tasks=>Dynamic row buffer management

    ActivestandbyPrechargestandbyPrechargepowerdownActivepowerdownActive

  • DRAM Power Reduction Strategy (contd) Increasing the idle period Schedule the operations to active banks firstRequest batchingClustering the requests in the memory controller

    Memory access pattern reshapingCompiler approach - array interleaving: fine-grain power-aware data allocation

    Run-time approach popularity layout

    temporal concentrationIdle timeA[0]A[1]A[2]B[0]B[1]B[2]Sample code:for(i=0;i

  • ApplicationRTL / BehaviorISSMicro-architectural ModelingSystemC WrapperSystemC WrapperSystemC WrapperSystemC WrapperSystemC WrapperRTL / BehaviorSimulation MethodologyRTL / BehaviorRTL / BehaviorRTL / BehaviorSystemC WrapperDRAM-SimDRAM ControllerApplicationISSMicro-architectural ModelingSystemC WrapperNTU-CoSim for Memory System Evaluation

  • Back-up Slides

  • References Energy-Aware Flash Memory Management in Virtual Memory System,L.-H. Lin, C.-L. Yang, H.-W., Tseng, to appear in IEEE Transactions on Very Large Scale Integration (VLSI) Systems Tolerating Memory Latency Through Push Prefetching for Pointer-Intensive Applications, C.-L. Yang, A. R. Lebeck, H.-W. Tseng, and C.-H. Lee, ACM Transacations on Architecture and Code Optimization, 1(4), 445-475, December, 2004 Software-Controlled Cache Architecture for Energy Efficiency, C.-L. Yang, H.-W. Tseng, C.-C. Ho, J.-L. Wu, IEEE Trans. Circuits Syst. Video Techn. 15(5), 634-644, May, 2005 Cache Leakage Control Mechanism for Hard Real-Time Systems, J.-W. Chi, Y.-J. Chen, and C.-L. Yang, in Proceedings of IEEE/ACM International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES '07), Saizburg, Austria, September, 2007. P. R. Panda, N. D. Dutt and A. Nicolau, On-chip vs. Off-chip Memory: The Data Partitioning Problem in Embedded Processor-based Systems, TODAES, 2000.G. Chen, G. Chen, O. Ozturk, and M. Kandemir, Exploiting Inter-Processor Data Sharing for Improving Behavior of Multi Processor SoCs, ISVLSI 2005. K.-B. Lee, T.-C. Lin and C.-W. Jen, An Efficient Quality-aware Memory Controller for Multimedia Platform SoC, IEEE Transactions on Circuits and Systems for Video Technology, 2005.S. A. McKee, W. A. Wulf, J. H. Aylor, R. H. Klenke, M. H. Salinas, Su. I. Hong, and D. A. B. Weikle, Dynamic Access Ordering for Streamed Computations, IEEE Transactions on Computers, 2000.Min, S. L. and Nam, E. H., Current trends in flash memory technology: invited paper (ASP-DAC 06)N. Vijaykrishnan, A. Sivasubranmaniam, V. Delaluz, M. Kandemir, and M. J. Irwin, DRAM energy management using software and hardware directed power mode control, HPCA, 2001M. Kandemir, N. Vijaykrishnan, V. Delaluz, A. Sivasubramaniam, and M. J. Irwin, Scheduler-based DRAM energy management, DAC, 2002M. Vijaykrishnan, M. J. Irwin, A. Sivasubramaniam, V. Delaluz, M. Kandemir and I. Kolcu, Compiler-directed array interleaving for reducing energy in multi-bank memories, VSLI Design, 2002H. Zeng, A. R. Lebeck, X. Fan, and C. S. Ellis, Power aware page allocation, ASPLOS, 2000

  • Shared/Private Memory Allocation Customize on-chip memory configuration by capturing the privately-accessed and shared data across processors

  • Architecture SpecificationHW/SW Partition

    System ConfigurationCPU frequency, voltageCache architecture (associativity, line size, capacity)3. Interconnection4. SDRAM configuration (RAS, CAS, etc)

    SWIn CParameterized IP (f,w)RTLLevelTimed-FunLevelHW/SW Co-Simulation Power/PerformanceMonitor ToolPower/Performance BreakdownI/D Cache Miss RateComponent Activation Bus Utilization/ContentionFeatures of NTU-CoSimCycle-accurate power/performance modelDetailed SDRAM modelTunable simulation platform for easy design space exploration NTU-CoSim for Memory System Evaluation *

  • Target ArchitectureOn-chip busEmbeddedProcessor1DMAOff-chip BusInterfaceOn-chip busSRAMEmbeddedProcessor2off-chip busDDR/SDRAM Mem ControllerSDRAMI/O 1I/O 2off-chip bus*

  • Power Management ChallengeState transition overheadPrecharge powerdown active standby ~= 5nsOverhead is relatively small, but not negiligable Read ~= 75ns, Write ~= 85nsBreak-even timeResynchronization overhead=PowerTimeActivestandbyPrechargestandbyPrechargepowerdownActivepowerdownActive5nsIdle powerLow power

  • Flash-Based Storage in an Embedded SystemOn-chip Interconnection NetworkGeneral Purpose Processor

    I-CacheD-CacheDigital Signal Processing (DSP)

    Local MemoryIP Core

    Local MemoryShared MemoryDRAM

    Flash-Based Storage

    FlashControllerMulti-banking flash memory chipsMemoryControllerMemoryBanksOn-chip Interconnection NetworkGeneral Purpose Processor

    I-CacheD-CacheDigital Signal Processing (DSP)

    Local MemoryIP Core

    Local MemoryShared MemoryDRAM

    Flash-Based Storage

    FlashControllerMulti-banking flash memory chipsMemoryControllerMemoryBanksOn-chip Interconnection NetworkCPU

    I-$D-$DSP

    Memory or $IP Core

    Private MemoryShared MemoryDRAM

    Flash-Based Storage

    FlashControllerMulti-banking flash memory chipsMemoryController

    MemoryBanksIP CoreDMA

  • Challenges in Adopting Flash-Based Solid State DriveUnique features on flash memoryWrite-once, out-place update, garbage collection, limited write/erase cyclesThe need to revisit designs assuming disk as the storagee.g, virtual memory system Reliability issue With the technology shrinking to smaller geometries, there comes the quality and reliability issues of the small geometry in addition to the existing flash memory reliability issues Adopting ECC, wear leveling Single flash chip bandwidth < disk bandwidthMultiple flash chip system Flash memory chipsFlash Memory ControllerHost Inter

  • Multiple flash chip system Imbalanced bandwidth between host interface bandwidth and flash memory bus/chip bandwidthHost interface: 3Gb/sSingle flash chip: 10MB/sFlash memory bus: 33MB/sTo increase bandwidth of flash subsystemParallel reads/writesData interleaving in each flash chipProblem: data locality is destroyedinefficient garbage collectionParallelized garbage collection:Write live dataFlash Memory BusFlash memory chipsFlash Memory ControllerVictim blockBufferRead live dataFlash ChipsFree blocksErase the blockRead live dataWrite live data :::::::::::Erase the blockFlash ChipsHost InterBuffer::::