Top Banner
An Introduction to Memory Optimization Techniques Koray Hagen
39

Introduction to Memory Optimization

Jun 13, 2015

Download

Software

Koray Hagen

A brief introduction to memory optimization techniques in C/C++ for Cal Poly Pomona students.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Introduction to Memory Optimization

An Introduction to Memory Optimization Techniques

Koray Hagen

Page 2: Introduction to Memory Optimization

My background

1. Software engineer in the games industry1. League of Legends2. Hearthstone3. PlayStation 4 4. Xbox 360

2. Worked on many optimization problems for:

1. Game scalability2. Content pipelines3. Client run-time4. Server run-time5. Data formats

3. One rule. Performance is king.

Page 3: Introduction to Memory Optimization

Prerequisite Knowledge

1. Exposure to C or C++

2. Exposure to computer architecture

Page 4: Introduction to Memory Optimization

Thank you to SCEA Santa Monica Studio

1. Christer Ericson1. VP, Central Technology, Activision2. Previously, Director of Technology, SCEA3. Author of “Real-time Collision Detection”4. Author of original optimization presentation5. Authority on optimization and game physics

Page 5: Introduction to Memory Optimization

The agenda

1. “The Black Box”

2. Memory hierarchies and cache

3. Optimization techniques for instructions and data

4. Aliasing and restriction

5. Closing thoughts and further reading

Page 6: Introduction to Memory Optimization

What won’t be covered

1. Data-Oriented Design1. Modern object-oriented programming is polluting programmer’s minds2. A refocus on creating better representations and computation around data rather than abstractions

2. SIMD or other approaches to vectorized code generation and usage1. Instruction level parallelism2. A deep dive into the losing war between processor speed and memory speed

Page 7: Introduction to Memory Optimization

The Black BoxChallenges in the modern era of computing

Page 8: Introduction to Memory Optimization

The downward spiral of performance

1. There is now an accelerating gap between CPU and memory performance1. CPU speed increasingly annually by ~ 60%2. Memory speed increasingly annually by ~10%

2. The gap has been closed by use of cache memory1. Recent renewed interest by the C++ community2. Unfortunately cache is still vastly unexploited3. Diminishing returns for large caches (physical locality)

3. Advances in instruction parallelism are overshadowing data performance1. Data consumption at run-time is astronomically high

4. Inefficient cache use is equal to lower performance1. Most obvious question, how do I increase cache utilization?2. Answer: cache aware programming/programmers (you after today’s slides)

Page 9: Introduction to Memory Optimization

Memory hierarchies and cacheA look at current architectures

Page 10: Introduction to Memory Optimization

An overview of cache

1. Memory hierarchy1. Discrete instruction cache2. Discrete data cache

2. Cache Lines1. Cache is physically divided into cache lines of N bytes (typically 32 and 64 bit) each2. The discrete unit for counting memory accesses

3. Example architecture – Direct mapping1. For an N kilobyte cache, bytes at position k, k+n, k+2n … map to a cache line

4. Example architecture – N-way associative1. Logical cache line corresponds to N physical lines2. Minimization of cache thrashing

Page 11: Introduction to Memory Optimization

Theoretical memory hierarchy

1 cycle

~ 2-5 cycles

~ 5-20 cycles

~ 40-100 cyclesMain memory

L2 cache

L1 cache

CPU

Page 12: Introduction to Memory Optimization

Example cache specifications

1. Emergence of L3 cache for high end processors2. Nothing magical about speed, strict physical locality requirements to the co-

processors and main memory

L1 cache (I &D) L2 cache

PlayStation 4 256 KB 4 MBWii U 64 KB 3 MB

Xbox One 256 KB 4 MBPC 512K 6 MB

Page 13: Introduction to Memory Optimization

Beware the three C’s of cache misses

1. Compulsory misses1. Unavoidable misses when reading in data for the first time.

2. Capacity misses1. Not enough cache space to hold all active data2. Too much data accessed in between successive use

3. Conflict misses1. Two blocks of memory are mapped to the same location and there is not enough room

to hold both, ultimately causing thrashing

Page 14: Introduction to Memory Optimization

Introduce the three R’s into your program

1. Rearrange (code and data)1. Change layouts to increase spatial locality

2. Reduce (size and number of cache lines read)1. Create smaller and smarter formats2. Compression

3. Reuse (cache lines)1. Increase temporal and spatial locality

Page 15: Introduction to Memory Optimization

Instruction and data cache optimizationStrategies for performance and cache-awareness

Page 16: Introduction to Memory Optimization

Instruction optimization strategy

1. Locality1. Reorder functions

1. Manually within the file2. Reorder object files during linker stage3. Visual studio intrinsic:

1. #pragma section("section-name" [attributes])2. Adapt coding style

1. Balance between monolithic functions and separation of logic2. Encapsulation and OOP are less cache friendly – usually not all

3. Implicit code generation1. Example (casting: cvttss2si)2. Study the code that your compiler generates3. Build intuition regarding how the compiler optimizes

Page 17: Introduction to Memory Optimization

Instruction optimization strategy … continued

1. Size1. Beware inlining, unrolling, large macros!

1. Always understand the cost-value tradeoffs of programming decisions2. Avoid unnecessary features and code paths3. Loop splitting and loop compounding

2. Again, always study the generated code.

Page 18: Introduction to Memory Optimization

Data optimization strategy

1. Compress data1. Does not necessarily mean compression algorithms2. Can you store more in less?

2. Cache conscious data layouts1. Padding to align to cache lines2. Reordering to align to cache lines3. Ordering variables by personal preference has no value

3. Linearizing data1. Array based data structures

Page 19: Introduction to Memory Optimization

Structure field data reordering

Data is likely accessed together, so store them together

Page 20: Introduction to Memory Optimization

Be aware of compiler padding

1. What are the values of size_one and size_two?

2. Hint: not the same, so how is member data aligned?

3. Ordering member data by self-enlightened organization is the result of bad programmer habits.

Page 21: Introduction to Memory Optimization

“Hot and cold” data division

1. Achieve much better cache coherence by striving for temporal locality among data members.

2. How often is your data in cache? Data access must scale towards the most common case, not the worst case.

Page 22: Introduction to Memory Optimization

“Hot and cold” data division … continued

Page 23: Introduction to Memory Optimization

Linearization of data

1. Nothing better than linear data1. Overall best spatial locality, values are right next to each other2. Easy to pre-fetch, and will result in better cache line hit probability.

2. What if my data can’t be represented linearized easily?1. Linearize at run-time, there is no excuse2. Fetch and store into a custom cache3. Great candidates for linearization

1. Hierarchy traversal2. Indexed data3. Random-access data

Page 24: Introduction to Memory Optimization

Matrix multiplication example

1. The result of bad programmer habits, and programming towards an assumed general case.1. How can it be better? 2. What options do we have?3. How can we save ourselves?

Page 25: Introduction to Memory Optimization

Matrix multiplication example … continued

1. But wait! There is a hidden assumption that result is not lhs is not rhs2. Compiler does not and cannot know this, more on this later

1. Line 3: lhs[0][0] and lhs[0][1] must be re-fetched2. Line 4: rhs[0][0] and rhs[1][0] must be re-fetched3. Line 5: lhs[1][0], lhs[1][1], rhs[0][1] and rhs[1][1] must be re-fetched

3. We can do even better

1. Let’s try unrolling the multiplication

Page 26: Introduction to Memory Optimization

Matrix multiplication example … continued

1. Cache all inputs, leave no room for unneeded indirection2. Write to needed memory locations once3. Result

1. No branches2. No conditionals3. No aliasing4. No side effects

Real Example

Page 27: Introduction to Memory Optimization

Aliasing and restricted pointersRun-time costs the compiler will never tell you

Page 28: Introduction to Memory Optimization

What is aliasing?

Aliasing is multiple references to the same storage location

What value is returned? Is it 1 or 2?Nobody knows

Page 29: Introduction to Memory Optimization

Penalties for introducing abstractions

1. Higher levels of abstraction have a negative effect on optimization1. Objected oriented code naturally inclines programmers toward cache obliviousness2. “Information hiding” key principle, potentially hiding insights into achieving optimal

performance

2. Inevitably lots of temporary objects

3. Objects live on the heap and stack1. Subject to aliasing problems2. Constant indirection to access and transform any meaningful data

4. Implicit aliasing through the this pointer1. Member variables are just as bad as globals

Page 30: Introduction to Memory Optimization

Penalties for introducing abstractions … continued

1. m_count is a member not local variable, therefore implicit this pointer2. m_count may be aliased by m_ptr

3. Every iteration is likely to refetch m_count from main memory

Page 31: Introduction to Memory Optimization

Penalties for introducing abstractions … continued

Are you sure the compiler does this optimization for you? Don’t leave it up to chance.

Page 32: Introduction to Memory Optimization

Restricted pointers

1. Restrict keyword1. Supported by many C++ compilers (MSVC, GCC)2. Controversial among standard committees

2. Restrict is a promise1. Tells the compiler that for the scope of the pointer, the target location of the pointer

will only be accessed through that pointer alone. It is a promise not to alias.

3. Important in C++1. Helps combat abstraction penalty problems2. Tricky semantics, easy to get wrong3. Compiler will never inform you about incorrect usage4. Incorrect usage results in agonizing pain

Page 33: Introduction to Memory Optimization

Restricted pointers … continued

What you really want is the compiler to generate this

… But because of aliasing, the compiler cannot do it

Page 34: Introduction to Memory Optimization

Restricted pointers … continued

The fix? Restrict the pointers

1. Prefer an explicit coding style, leave nothing to chance2. Be careful and pragmatic, understand what code paths can be taken with

functions3. Remember, a restrict qualified pointer can grant access to a non-restrict pointer

Page 35: Introduction to Memory Optimization

Restricted pointers … continued

Remember, despite intuition “const” doesn’t help

1. “Wait, since *rhs is const lhs[i] cannot write to it right?” … WRONG2. const promises that *rhs is const through rhs, NOT that *rhs is const in general3. const is for detecting programming errors, not fixing aliasing

Page 36: Introduction to Memory Optimization

Tips for avoiding aliasing

1. Minimize use of globals pointers and references1. Recall the semantics of the Matrix example2. Pass small variables by value3. Use local variables as much as possible

2. Restrict pointers and references when appropriate

3. Declare variables close to the point of use, and no further

4. Aim to write “pure” functions, and strive for const-correctness

5. Study generated code!

Page 37: Introduction to Memory Optimization

Optimization isn’t magic

1. Strive for explicitness in programming1. Leave no room for unintended side effects

2. Understand the hardware architecture that is being targeted1. Constant factors in programming matter2. Relevant for all platforms

1. Game consoles2. Mobile3. Servers4. ... Even normal desktops/laptops

3. Not over, many more topics to explore1. Branch prediction2. SIMD and vectorized code3. Cache-aware data structures

Page 38: Introduction to Memory Optimization

Further reading and references

1. Abrash, Michael. Zen of Code Optimization. Scottsdale, AZ: Coriolis Group, 1994. Print.

2. Ericson, Christer. Real-time Collision Detection. Amsterdam: Elsevier, 2005. Print.

3. Fabian, Richard. "Data-Oriented Design." Data-Oriented Design. N.p., 25 June 2013. Web. 03 Apr. 2014.

4. Hennessy, John L., David A. Patterson, and David A. Patterson. Computer Architecture: A Quantitative Approach. San Francisco, CA: Morgan Kaufmann, 2003. Print.

Page 39: Introduction to Memory Optimization

Thank you, any questions?