Top Banner
BENCHMARK INSTRUMENTATION Umit Cavus BUYUKSAHIN Measurements Tools & Techinics, Spring ‘12 4/17/2012
31

M&t presentation

May 24, 2015

Download

Technology

civcimix

Benchmark Instrumentation, Paraver
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: M&t presentation

BENCHMARK

INSTRUMENTATION

Umit Cavus BUYUKSAHIN Measurements Tools & Techinics, Spring ‘12

4/17/2012

Page 2: M&t presentation

OUTLINE

• NAS Benchmark Suite

• Experiments

• Paraver Visualization

• Code View

• Communication

• Disk I/O

• Load Balancing

• LD1 Cache Miss

• Cycles per Instruction (CPI)

• Execution Time

• Benchmarking Time

• Conclusion

Benchmark Instrumentation 2

Page 3: M&t presentation

NAS Benchmark Suite

• NAS ... is a set of benchmarks.

... evaluates performance of highly parallel supercomputers.

... developed and maintained by NASA Advanced Supercomputing(NAS).

Benchmark Instrumentation 3

Page 4: M&t presentation

NAS Benchmark Suite

• NAS Kernel Applications

• IS - Integer Sort

• EP - Embarrassingly Parallel

• CG - Conjugate Gradient

• MG - Multi-Grid

• FT - discrete 3D fast Fourier Transform

• Problem Sizes

• S : small size

• W : workstation size

• A, B, C : standart test size; ~4X size in increasing order

• D, E, F : large test size; ~16X size in increasing order

Benchmark Instrumentation 4

Page 5: M&t presentation

OUTLINE

• NAS Benchmark Suite

• Experiments

• Paraver Visualization

• Code View

• Communication

• Disk I/O

• Load Balancing

• LD1 Cache Miss

• Cycles per Instruction (CPI)

• Execution Time

• Benchmarking Time

• Conclusion

Benchmark Instrumentation 5

Page 6: M&t presentation

Experiments

• NAS Parallel Benchmark version 3.2.1

• IS Kernel Application: ... sorts N keys in parallel.

... tests

• integer computation speed

• communication perfomance

• S Problem Size: ... small for quick test purposes

... has 216 keys

Benchmark Instrumentation 6

Page 7: M&t presentation

Experiments

• IS Benchmarking Procedure (generally)

1. Generating sequence of N keys

2. Loading N keys into the memory systems

3. Time begins

4. Loop

Sorting & partial verification

5. Time ends

6. Full verification.

Benchmark Instrumentation 7

Page 8: M&t presentation

Experiments

Machines:

• My Computer

i686 GNU/Linux

3Gb Ram

2 CPUSs with 800Mhz

• Boada

x86_64 x86_64 x86_64 GNU/Linux

24Gb Ram

24 CPUS with 1596Mhz

Benchmark Instrumentation 8

Page 9: M&t presentation

Experiments

Procedure:

• Not manually instrumented.

• Paraver traces are automatically generated

• LD_PRELOAD is exported.

• Benchmarks are executed with 2,4,8,16,32, and 64 processors.

• Benchmark results are analyzed

• Generated traces are examined in paraver tools.

Benchmark Instrumentation 9

Page 10: M&t presentation

OUTLINE

• NAS Benchmark Suite

• Experiments

• Paraver Visualization

• Code View

• Communication

• Disk I/O

• Load Balancing

• LD1 Cache Miss

• Cycles per Instruction (CPI)

• Execution Time

• Benchmarking Time

• Conclusion

Benchmark Instrumentation 10

Page 11: M&t presentation

Paraver Visualization – Code View

• My Computer

• Boada

Benchmark Instrumentation 11

Page 12: M&t presentation

Paraver Visualization – Communication

• My Computer

• Boada

Benchmark Instrumentation 12

Page 13: M&t presentation

Paraver Visualization – Disk I/O

• My Computer

• Boada

Benchmark Instrumentation 13

Page 14: M&t presentation

Paraver Visualization – Load Balance

• My Computer

....

Benchmark Instrumentation 14

Page 15: M&t presentation

Paraver Visualization – Load Balance

• Boada

....

Benchmark Instrumentation 15

Page 16: M&t presentation

Paraver Visualization – LD1 Cache Miss

• My Computer

Benchmark Instrumentation 16

Page 17: M&t presentation

Paraver Visualization – LD1 Cache Miss

• Boada

Benchmark Instrumentation 17

Page 18: M&t presentation

Paraver Visualization – CPI

• My Computer

Benchmark Instrumentation 18

Page 19: M&t presentation

Paraver Visualization – CPI

• Boada

Benchmark Instrumentation 19

Page 20: M&t presentation

OUTLINE

• NAS Benchmark Suite

• Experiments

• Paraver Visualization

• Code View

• Communication

• Disk I/O

• Load Balancing

• LD1 Cache Miss

• Cycles per Instruction (CPI)

• Execution Time

• Benchmarking Time

• Conclusion

Benchmark Instrumentation 20

Page 21: M&t presentation

Execution Time

Benchmark Instrumentation 21

0

2000

4000

6000

8000

10000

12000

14000

16000

2 4 8 16 32 64

MyComputer

Boada

# of processors

Tim

e (

ms)

Page 22: M&t presentation

Execution Time

• Relative Speedup =𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛𝑇𝑖𝑚𝑒 𝑜𝑓 𝑀𝑦𝐶𝑜𝑚𝑝𝑢𝑡𝑒𝑟

𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛𝑇𝑖𝑚𝑒 𝑜𝑓 𝐵𝑜𝑎𝑑𝑎

Benchmark Instrumentation 22

0

10

20

30

40

50

60

1 2 4 8 16 32 64

# of processors

Sp

ee

dU

p

Page 23: M&t presentation

OUTLINE

• NAS Benchmark Suite

• Experiments

• Paraver Visualization

• Code View

• Communication

• Disk I/O

• Load Balancing

• LD1 Cache Miss

• Cycles per Instruction (CPI)

• Execution Time

• Benchmarking Time

• Conclusion

Benchmark Instrumentation 23

Page 24: M&t presentation

Benchmarking Time - reminder

• IS Benchmarking Procedure (generally)

1. Generating sequence of N keys

2. Loading N keys into the memory systems

3. Time begins

4. Loop

Sorting & partial verification

5. Time ends

6. Full verification.

• Benchmarking Time = execution time of the parallel

algorithm

Benchmark Instrumentation 24

Page 25: M&t presentation

Benchmarking Time

Benchmark Instrumentation 25

0,000

0,200

0,400

0,600

0,800

1,000

1,200

1,400

1,600

1,800

2,000

1 2 4 8 16 32 64

MyComputer

Boada

# of processors

Tim

e (

se

c)

Page 26: M&t presentation

Benchmarking Time

• Relative Speedup =𝐵𝑒𝑛𝑐ℎ𝑚𝑎𝑟𝑘𝑖𝑛𝑔𝑇𝑖𝑚𝑒 𝑜𝑓 𝑀𝑦𝐶𝑜𝑚𝑝𝑢𝑡𝑒𝑟

𝐵𝑒𝑛𝑐ℎ𝑚𝑎𝑟𝑘𝑖𝑛𝑔𝑇𝑖𝑚𝑒 𝑜𝑓 𝐵𝑜𝑎𝑑𝑎

Benchmark Instrumentation 26

0,00

10,00

20,00

30,00

40,00

50,00

60,00

70,00

1 2 4 8 16 32 64# of processors

Sp

ee

dU

p

Page 27: M&t presentation

Benchmarking Time

Benchmark Instrumentation 27

• SpeedUp of My Computer

0

0,2

0,4

0,6

0,8

1

1,2

1 2 4 8 16 32 64# of processors

Sp

ee

dU

p

Page 28: M&t presentation

Benchmarking Time

Benchmark Instrumentation 28

• SpeedUp of Boada

0

1

2

3

4

5

6

7

1 2 4 8 16 32 64 # of processors

Sp

ee

dU

p

Page 29: M&t presentation

OUTLINE

• NAS Benchmark Suite

• Experiments

• Paraver Visualization

• Code View

• Communication

• Disk I/O

• Load Balancing

• LD1 Cache Miss

• Cycles per Instruction (CPI)

• Execution Time

• Benchmarking Time

• Conclusion

Benchmark Instrumentation 29

Page 30: M&t presentation

Conclusion

• IS application • ... does not have so much communication.

• ... is based on computation and memory loading.

• ... has low cache miss and high CPI values in computation phase.

• NAS is designed for highly parallel supercomputers. • MyComputer is inadequate to meet requierments of NAS.

• MyComputer can not speed up in this application.

• Boada can speed up untill number of processors that it has.

• Mycomputer saves less time for disk I/O operations.

• CPI values in Boada’ s computation phase less.

Benchmark Instrumentation 30

Page 31: M&t presentation

BENCHMARK

INSTRUMENTATION

Umit Cavus BUYUKSAHIN Measurements & Tools, Spring ‘12

4/17/2012