Top Banner
Paper Report Presenter: Zong Ze- Huang Fast and Accurate Resource Conflict Simulation for Performance Analysis of Multi-Core Systems Stattelmann, S. ; Bringmann, O ; Rosenstiel, W. Design, Automation & Test in Europe Conference & Exhibition (DATE), 2011
16

Presenter: Zong Ze-Huang Fast and Accurate Resource Conflict Simulation for Performance Analysis of Multi- Core Systems Stattelmann, S. ; Bringmann, O.

Dec 31, 2015

Download

Documents

Allan Joseph
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Presenter: Zong Ze-Huang Fast and Accurate Resource Conflict Simulation for Performance Analysis of Multi- Core Systems Stattelmann, S. ; Bringmann, O.

Paper Report

Presenter: Zong Ze-Huang

Fast and Accurate Resource Conflict Simulation for

Performance Analysis of Multi-Core Systems

Stattelmann, S. ; Bringmann, O ; Rosenstiel, W.Design, Automation & Test in Europe Conference & Exhibition (DATE), 2011

Page 2: Presenter: Zong Ze-Huang Fast and Accurate Resource Conflict Simulation for Performance Analysis of Multi- Core Systems Stattelmann, S. ; Bringmann, O.

2

This work presents a SystemC-based simulation approach for fast performance analysis of parallel software components, using source code annotated with low-level approaches for performance analysis, timing attributes obtained from binary code can be annotated even if compiler optimizations are used without requiring changes in the compiler. To consider concurrent accesses to shared resources like caches accurately during a source-level simulation, an extension of the SystemC TLM-2.0 standard for reducing the necessary synchronization overhead is proposed as well. This enables the simulation of low-level timing effects without performing a full-fledged instruction set simulation and at speeds close to pure native execution.

Abstract

Page 3: Presenter: Zong Ze-Huang Fast and Accurate Resource Conflict Simulation for Performance Analysis of Multi- Core Systems Stattelmann, S. ; Bringmann, O.

3

What is the Problem Estimating software performance at an early design stage is often crucial. Estimating the performance of software intensive systems is a complex

task.。Instruction set simulator(ISS) can be used to estimate the performance of binary code but

speed slowly.。Multi-core systems, the interaction of software components due to concurrent accesses to

shared resources.

Proposal methods A fast and accurate approach for system level performance analysis.

。Estimate the execution times without using an ISS, software components are annotated with timing properties form a source-level.

Quantum Giver synchronization approach is presented. SystemC TLM-2.0 standard is described which reduces the synchronization

overhead.

What is the Problem

Page 4: Presenter: Zong Ze-Huang Fast and Accurate Resource Conflict Simulation for Performance Analysis of Multi- Core Systems Stattelmann, S. ; Bringmann, O.

4

Related work

Fast and Accurate Resource Conflict Simulation forPerformance Analysis of Multi-Core SystemsThis

paper:

[11]

[12], [13]

[7], [14], [15]

dynamic performance evaluation usingsource code

compiler optimizations is supported through the use ofmodified compiler tool chains

Describes a cache simulation specifically developed for non-functional simulation

[17]

The concept of Result-Oriented Modeling is similarities with the Quantum Giver approach

[19]

increase simulation performance of TLM-2.0models

provide solutions for an efficient temporallydecoupled simulation of shared caches

Reduce additional context switches

Page 5: Presenter: Zong Ze-Huang Fast and Accurate Resource Conflict Simulation for Performance Analysis of Multi- Core Systems Stattelmann, S. ; Bringmann, O.

5

One of TLM-2.0 coding style – “fast”. Early software development. Another coding style is Approximately-timed –”accurate”.

Using temporally decoupled simulation to increase simulation performance. Some simulation parts that not interact with the surrounding

environment frequently might run ahead of the current simulation time for a short amount of time.

Avoid unnecessary kernel synchronization points and context switches.

As soon as the local time offset of a temporally decoupled process reaches the global time quantum, it must synchronize its local time with the global simulation time by calling “wait()”.

Loosely-timed coding style

Page 6: Presenter: Zong Ze-Huang Fast and Accurate Resource Conflict Simulation for Performance Analysis of Multi- Core Systems Stattelmann, S. ; Bringmann, O.

6

Synchronization Frequency

Page 7: Presenter: Zong Ze-Huang Fast and Accurate Resource Conflict Simulation for Performance Analysis of Multi- Core Systems Stattelmann, S. ; Bringmann, O.

7

TLM-AT

Page 8: Presenter: Zong Ze-Huang Fast and Accurate Resource Conflict Simulation for Performance Analysis of Multi- Core Systems Stattelmann, S. ; Bringmann, O.

8

TLM-LT coding style cannot replace a synchronization mechanism to resolve data dependencies between processes or accesses to shared resources.

TLM-LT faster simulation

Page 9: Presenter: Zong Ze-Huang Fast and Accurate Resource Conflict Simulation for Performance Analysis of Multi- Core Systems Stattelmann, S. ; Bringmann, O.

9

The lack of a synchronization mechanism when use temporally decoupled. Read old data or newly written data can be overwritten by an earlier

write from another process which is scheduled afterwards.

For instance an instruction or data cache is simulated, the order of accesses can determine whether an access is cache hit or cache miss. In the worst case, excessive synchronization will degrade the TLM-

LT simulation performance to the level of TLM-AT.

Lack of synchronization mechanism

Page 10: Presenter: Zong Ze-Huang Fast and Accurate Resource Conflict Simulation for Performance Analysis of Multi- Core Systems Stattelmann, S. ; Bringmann, O.

10

Quantum Giver synchronization approach is presented.

Three phases: 1) Simulation Phase

。Processes are simulated using temporal decoupling.。All transactions issued by an initiator are completed immediately.。After initiator have reached the maximal local time offset, the

synchronization phase is executed.

2) Synchronization Phase。All target components order the transactions they have received in the

simulation phase .。Detect any changes in the previously predicted time for transactions due

to conflicts.。These changes are then broadcasted to all other target components,

possibly triggering further changes until all components have reached a stable state.

3)Scheduling Phase。Quantum Giver creates SystemC events to wake up the respective

process.

Access Synchronization of Share Resources

Page 11: Presenter: Zong Ze-Huang Fast and Accurate Resource Conflict Simulation for Performance Analysis of Multi- Core Systems Stattelmann, S. ; Bringmann, O.

11

Phases of Synchronization Protocol

Page 12: Presenter: Zong Ze-Huang Fast and Accurate Resource Conflict Simulation for Performance Analysis of Multi- Core Systems Stattelmann, S. ; Bringmann, O.

12

Using mapping between source code and the binary code, source code can be annotated information about timing behavior before it is used in the simulation model.

If an optimizing compiler is used, the compiler-generated debug information might be incorrect. Inconsistencies in the compiler-generated debug information must

be eliminated. Reconstructed relation between basic block in the binary code and

source code lines.

Commercial tool AbsInt aiT was integrated into the analysis flow to produce a binary-level control flow graph annotated with execution times.

Source-Level Simulation of Machine Code

Page 13: Presenter: Zong Ze-Huang Fast and Accurate Resource Conflict Simulation for Performance Analysis of Multi- Core Systems Stattelmann, S. ; Bringmann, O.

13

Analysis and Instrumentation Work Flow

Page 14: Presenter: Zong Ze-Huang Fast and Accurate Resource Conflict Simulation for Performance Analysis of Multi- Core Systems Stattelmann, S. ; Bringmann, O.

14

Use three synchronization approaches to tested fast and accuracy. 1. No synchronization, meaning each access to the

instruction cache was executed directly during TLM-LT

simulation. 2. Synchronization mechanism using the Quantum Giver

approach. 3. Explicit synchronization using calls to wait before each

cache access which corresponds to TLM-AT

How to prove the proposal

Page 15: Presenter: Zong Ze-Huang Fast and Accurate Resource Conflict Simulation for Performance Analysis of Multi- Core Systems Stattelmann, S. ; Bringmann, O.

15

The simulation performance is improved by about 25% compare with lock-step simulation.

Experimental Results - fast

Page 16: Presenter: Zong Ze-Huang Fast and Accurate Resource Conflict Simulation for Performance Analysis of Multi- Core Systems Stattelmann, S. ; Bringmann, O.

16

The Quantum Giver synchronization approach are very close to the estimates reported by lock-step.

More accurate than simulation without synchronization.

Experimental Results - accuracy