Top Banner
Efficient Dynamic Heap Allocation of Scratch-Pad Memory Ross McIlroy, Peter Dickman and Joe Sventek Carnegie Trust for the Universities of Scotland
31

Efficient Dynamic Heap Allocation of Scratch-Pad Memory

Jan 05, 2016

Download

Documents

Vlad

Carnegie Trust for the Universities of Scotland. Efficient Dynamic Heap Allocation of Scratch-Pad Memory. Ross McIlroy, Peter Dickman and Joe Sventek. Scratch-Pad Memory Allocator. SMA: A dynamic memory allocator targeting extremely small memories (< 1MB in size) - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Efficient Dynamic Heap Allocation of Scratch-Pad Memory

Efficient Dynamic Heap Allocationof Scratch-Pad MemoryRoss McIlroy, Peter Dickman and Joe Sventek

Carnegie Trustfor the Universities of Scotland

Page 2: Efficient Dynamic Heap Allocation of Scratch-Pad Memory

Scratch-Pad Memory Allocator

SMA: A dynamic memory allocator targeting extremely small memories(< 1MB in size)

•Why target such tiny memories?

•Why provide dynamic memory allocation for such small memories?

Page 3: Efficient Dynamic Heap Allocation of Scratch-Pad Memory

Outline

•Rational for SMA

•SMA Approach

•Results

•Concurrent SMA

•Conclusion / Future work

Page 4: Efficient Dynamic Heap Allocation of Scratch-Pad Memory

Outline

•Rational for SMA

•SMA Approach

•Results

•Concurrent SMA

•Conclusion / Future work

Page 5: Efficient Dynamic Heap Allocation of Scratch-Pad Memory

What Tiny Memories?

•Embedded Systems– Sensor Network Motes

– Vehicular Devices

•Scratch-Pad Memories– Network Processors

– Heterogeneous Multi-Core Processors

Page 6: Efficient Dynamic Heap Allocation of Scratch-Pad Memory

Scratch-Pad Memories

•Memory structured as a hierarchy– Small fast memories, large slow memories

•Usually hidden by hardware caches

•Some processor architectures employ scratch-pad memories instead– Similar size and speed as caches, but explicitly

accessible by software

•Examples– IBM Cell processor

– Intel IXP network processors

– Intel PXA mobile phone processors

Page 7: Efficient Dynamic Heap Allocation of Scratch-Pad Memory

Why Dynamic Management?

•Developers want as much useful data in the fast Scratch-Pad memory as possible

•They don’t want to deal with the fragmented memory hierarchyManual Static

Developer ease ✗ ✓Make full use of Scratch-Pad ✓ ✗

Dynamic

✓✓

Page 8: Efficient Dynamic Heap Allocation of Scratch-Pad Memory

Why SMA?

ResourceDoug Lea

malloc

State Memory (bytes) 516

Code Memory (instructions) 1634

Avg. Alloc Time (cycles) 70.7

Avg. Free Time (cycles) 95.2

SMAmalloc

40

297

72.8

52.4

Managing 4kB Scratch-Pad memory on an Intel IXP processor

Page 9: Efficient Dynamic Heap Allocation of Scratch-Pad Memory

Outline

•Rational for SMA

•SMA Approach

•Results

•Concurrent SMA

•Conclusion / Future work

Page 10: Efficient Dynamic Heap Allocation of Scratch-Pad Memory

Basic Approach

•By default represent memory coarsely as a series of fixed size blocks– Can employ a very simple bitmap based

allocation / free algorithm

•When required, split blocks into variable sized regions– Prevents excessive internal fragmentation

Page 11: Efficient Dynamic Heap Allocation of Scratch-Pad Memory

Large Block Allocation

•Each block in memory represented by a bit in a free-block bitmap

1111001111111111001100000011111111 11 11

rem_blocks = blocks_bm & ~mask;next_pos = ffs(rem_blocks);

in_use = mask & ~blocks_bm;next pos = fls(in_use) + 1;

11

Page 12: Efficient Dynamic Heap Allocation of Scratch-Pad Memory

Small Region Allocation

•Unused parts of an allocated block can be reused by sub-block sized allocations

•Blocks are split into power of two sized regions, in a Binary Buddy type approach

•Free regions are stored in per-size free lists

Page 13: Efficient Dynamic Heap Allocation of Scratch-Pad Memory

Coalescing Freed Regions

•We wanted to avoid boundary tags

•Instead the orderly way in which regions are split is exploited

•A word sized coalesce tag stores the coalesce details for all regions in a block1

Page 14: Efficient Dynamic Heap Allocation of Scratch-Pad Memory

Deferred Coalescing

•SMA (CAM)– Any size can have coalescing deferred

– Content addressable memory used to associate thesize of deferred coalesced regions with the regionsthemselves

•SMA (LM)– Sizes which coalescing can be deferred chosen at

compile time

– Deferred regions stored in an array in local memory

Page 15: Efficient Dynamic Heap Allocation of Scratch-Pad Memory

Outline

•Rational for SMA

•SMA Approach

•Results

•Concurrent SMA

•Conclusion / Future work

Page 16: Efficient Dynamic Heap Allocation of Scratch-Pad Memory

Experimental Setup

•Intel IXP 2350– Network processor

– 4 microengine cores with 4kB local scratch-pad each

– Access to another 16kB of shared scratch-pad

•Compared against Doug Lea’s malloc

a2p Conversion of a 15kB text file to postscript

gcc Compilation of the file “combine.c” in the gcc source, using gcc

gst Ghostscript extraction of a 682kB postscript file

cvt Application of the charcoal filter to a 1024x768 Jpeg image using ImageMagick

ogg

Encoding of a 20 second wav file using the ogg encoder

pyt Execution of the python example file “md5driver.py”

tar Archive and gzip compression of 27 files in 4 directories into a 1Mb archive

Page 17: Efficient Dynamic Heap Allocation of Scratch-Pad Memory

Allocation Performance

Page 18: Efficient Dynamic Heap Allocation of Scratch-Pad Memory

Free Performance

Page 19: Efficient Dynamic Heap Allocation of Scratch-Pad Memory

Memory Wastage

Page 20: Efficient Dynamic Heap Allocation of Scratch-Pad Memory

Memory Wastage

Page 21: Efficient Dynamic Heap Allocation of Scratch-Pad Memory

Outline

•Rational for SMA

•SMA Approach

•Results

•Concurrent SMA

•Conclusion / Future work

Page 22: Efficient Dynamic Heap Allocation of Scratch-Pad Memory

Lock-Free Block Allocation

•State for large blocks is stored in the free-block bitmap

•A simple lock-free update algorithm can be used to protect this bitmap– Uses the test and clear primitive 11110011111111110011000000111111Global

Thread 1 Thread 2

11110011111111110011000000111111 11110011111111110011000000111111

Test & Clear

00 00

Test & Clear

00 00

00

Atomic Set

00 00 00

Page 23: Efficient Dynamic Heap Allocation of Scratch-Pad Memory

Protecting Small Region Lists

•Locks are used to protect the free-lists used for small size allocation– SMA Coarse uses one lock

– SMA Fine uses one lock per size class

•In SMA Fine, when regions are being coalesced, two locks must be held briefly

Page 24: Efficient Dynamic Heap Allocation of Scratch-Pad Memory

Concurrency Scaling

Page 25: Efficient Dynamic Heap Allocation of Scratch-Pad Memory

Outline

•Rational for SMA

•SMA Approach

•Results

•Concurrent SMA

•Conclusion / Future work

Page 26: Efficient Dynamic Heap Allocation of Scratch-Pad Memory

Future Work

•Provide the illusion of a single memory

•Let runtime worry about data placement

•Data can be annotated to give hints to the runtime system

Page 27: Efficient Dynamic Heap Allocation of Scratch-Pad Memory

Conclusion

•Tiny memories need to be managed too

•SMA is a simple and efficient algorithm for dynamic management of small memories– Fixed size block allocation is simple and has low state

overheads

– Splitting partially used blocks to be reused by small allocations limits fragmentation

•SMA can be augmented to support concurrent requests from multiple cores

Page 28: Efficient Dynamic Heap Allocation of Scratch-Pad Memory

Questions?

Page 29: Efficient Dynamic Heap Allocation of Scratch-Pad Memory

16kb Management Allocation

Page 30: Efficient Dynamic Heap Allocation of Scratch-Pad Memory

16kB Management Free

Page 31: Efficient Dynamic Heap Allocation of Scratch-Pad Memory

16kB Management Waste