Top Banner
University of Maryland Dynamic Floating-Point Error Detection Mike Lam, Jeff Hollingsworth and Pete Stewart
20

Dynamic Floating-Point Error Detection

Feb 25, 2016

Download

Documents

airell

Dynamic Floating-Point Error Detection. Mike Lam, Jeff Hollingsworth and Pete Stewart. Motivation. Finite precision -> roundoff error Compromises ill-conditioned calculations Hard to detect and diagnose Increasingly important as HPC grows Single-precision is faster on GPUs - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Dynamic Floating-Point Error Detection

University of Maryland

Dynamic Floating-Point Error Detection

Mike Lam, Jeff Hollingsworth and Pete Stewart

Page 2: Dynamic Floating-Point Error Detection

University of Maryland 2

Motivation Finite precision -> roundoff error

Compromises ill-conditioned calculations Hard to detect and diagnose Increasingly important as HPC grows

Single-precision is faster on GPUs Double-precision fails on long-running

computations Previous solutions are problematic

Numerical analysis requires training Manual re-writing and testing in higher

precision is tedious and time-consuming

Page 3: Dynamic Floating-Point Error Detection

University of Maryland 3

Our Solution• Instrument floating-point instructions

• Automatic• Minimize developer effort• Ensure analysis consistency and correctness

• Binary-level• Include shared libraries w/o source code• Include compiler optimizations

• Runtime• Data-sensitive

Page 4: Dynamic Floating-Point Error Detection

University of Maryland 4

Our Solution• Three parts• Utility that inserts binary instrumentation• Runtime shared library with analysis routines• GUI log viewer

General overview Find floating-point instructions and insert

calls to shared library Run instrumented program View output with GUI

Page 5: Dynamic Floating-Point Error Detection

University of Maryland 5

Our Solution Dyninst-based instrumentation

Cross-platform No special hardware required Stack walking and binary rewriting

Java GUI Cross-platform Minimal development effort

Page 6: Dynamic Floating-Point Error Detection

University of Maryland 6

Our Solution• Cancellation detection• Instrument addition & subtraction• Compare runtime operand values• Report cancelled digits

• Side-by-side (“shadow”) calculations• Instrument all floating-point instructions• Higher/lower precision• Different representation (i.e. rationals)• Report final errors

Page 7: Dynamic Floating-Point Error Detection

University of Maryland 7

Cancellation Detection• Overview• Loss of significant digits during operations

• For each addition/subtraction: Extract value of each operand Calculate result and compare magnitudes

(binary exponents)• If eans < max(ex,ey) there is a

cancellation• For each cancellation event:

• Record a “priority:” max(ex,ey) - eans• Save event information to log

Page 8: Dynamic Floating-Point Error Detection

University of Maryland 8

Page 9: Dynamic Floating-Point Error Detection

University of Maryland 9

Page 10: Dynamic Floating-Point Error Detection

University of Maryland 10

Gaussian EliminationA -> [L,U]

Comparison of eight methods Classical Classical w/ partial pivoting Classical w/ full pivoting Bordering (“Sherman’s march”) “Pickett’s charge” “Pickett’s charge” w/ partial pivoting Crout’s method Crout’s method w/ partial pivoting

Page 11: Dynamic Floating-Point Error Detection

University of Maryland 11

Gaussian Elimination

Page 12: Dynamic Floating-Point Error Detection

University of Maryland 12

Gaussian Elimination

Classical vs. Bordering

Page 13: Dynamic Floating-Point Error Detection

University of Maryland 13

Gaussian Elimination

Classical BorderingOperations 285 294Cancellations 39 9Cancels/ops 14% 3%Average bits 5.23 22.78

Page 14: Dynamic Floating-Point Error Detection

University of Maryland 14

SPEC Benchmarks• Results are hard to interpret without

domain knowledge

• Overheads:

Page 15: Dynamic Floating-Point Error Detection

University of Maryland 15

Roundoff Error Sparse “shadow value” table

Maps memory addresses to alternate values Shadow values can be single-, double-, quad- or

arbitrary-precision Other ideas: rationals, # of significant digits, etc.

Instrument every FP instruction• Extract operation type and operand addresses• Perform the same operation on corresponding

shadow values• Output shadow values and errors upon

termination

Page 16: Dynamic Floating-Point Error Detection

University of Maryland 16

Page 17: Dynamic Floating-Point Error Detection

University of Maryland 17

More Gaussian Elimination

Maximum relative error

25x25 50x50 100x100

Partial pivoting 9.3e-10 2.3e-2 1.0Full pivoting 1.3e-15 2.4e-15 4.8e-15

Page 18: Dynamic Floating-Point Error Detection

University of Maryland 18

Issues & Possible Solutions• Expensive overheads (100-500X)• Optimize with inline snippets• Reduce workload with data flow analysis

• Following values through compiler optimizations• Selectively instrument MOV instructions

• Filtering false positives• Deduce “root cause” of error using data flow

Page 19: Dynamic Floating-Point Error Detection

University of Maryland 19

Conclusion

• Analysis of floating-point error is hard

• Our tool provides automatic analysis of such error

• Work in progress

Page 20: Dynamic Floating-Point Error Detection

University of Maryland 20

Thank you!