Binary-Level Tools for Floating-Point Correctness Analysis Michael Lam LLNL Summer Intern 2011 Bronis de Supinski, Mentor.

Binary-Level Tools forBinary-Level Tools forFloating-PointFloating-Point

Correctness AnalysisCorrectness Analysis

Michael LamLLNL Summer Intern 2011Bronis de Supinski, Mentor

BackgroundBackground• Floating-point represents real numbers as (± sgnf × 2exp)

o Sign bito Exponento Significand (“mantissa” or “fraction”)

• Floating-point numbers have finite precisiono Single-precision: 24 bits (~7 decimal digits)o Double-precision: 53 bits (~16 decimal digits)

032 16 8 4

Significand (23 bits)Exponent (8 bits)

IEEE Single

2

03264 16 8 4

Significand (52 bits)Exponent (11 bits)

IEEE Double

ExampleExampleπ 3.141592…

3Images courtesy of BinaryConvert.com

Single-precision

Double-precision

1/10 0.1

4Images courtesy of BinaryConvert.com

Single-precision

Double-precision

ExampleExample

MotivationMotivation• Finite precision causes round-off error

o Compromises ill-conditioned calculationso Hard to detect and diagnose

• Increasingly important as HPC scaleso Need to balance speed (singles) and accuracy (doubles)o Double-precision may still fail on long-running computations

5

Previous SolutionsPrevious Solutions

• Analytical (Wilkinson, et al.)o Requires numerical analysis expertiseo Conservative static error bounds are largely unhelpful

• Ad-hoco Run experiments at different precisionso Increase precision where necessaryo Tedious and time-consuming

6

Our ApproachOur Approach• Run Dyninst-based mutator

o Find floating-point instructionso Insert new code or a call to shared library

• Run instrumented programo Analysis augments/replaces original programo Store results in a log file

• View output with GUI

7

AdvantagesAdvantages• Automated (vs. manual)

o Minimize developer efforto Ensure consistency and correctness

• Binary-level (vs. source-level)o Include shared libraries without source codeo Include compiler optimizations

• Runtime (vs. compile time)o Dataset and communication sensitivity

8

Previous WorkPrevious Work• Cancellation detection

o Logs numerical cancellation of binary digits

• Alternate-precision analysiso Simulates re-compiling with different precision

9

SummerSummerContributionsContributions

• Cancellation detectiono Improved support for multi-core analysis

• Overflow detectiono New tool for logging integer overflowo Possibilities for extension and incorporation into

floating-point analysis

• Alternate-precision analysiso New “in-place” analysiso Much-improved performance and robustness

10

CancellationCancellation• Loss of significant digits during subtraction

operations

• Cancellation is a symptom, not the root problem• Indicates that a loss of information has occurred

that may cause problems later

1.613647 (7) 1.613647 (7) - 1.613635 (7) - 1.613647 (7) 0.000012 (2) 0.000000 (0)

(5 digits cancelled) (all digits cancelled)

1.6136473- 1.6136467 0.0000006

11

Cancellation DetectorCancellation Detector• Instrument every addition and subtraction

o Simple exponent-based test for cancellationo Log the results to an output file

12

• Better support for multi-coreo Log to multiple fileso Future work: exploring GUI aggregation schemes

• Ran experiments on AMG2006

13

ContributionsContributions

ContributionsContributions• New proof-of-concept tool

o Instruments all instructions that set OF (the overflow flag)o Log instruction pointer to outputo Works on integer instructionso Introduces ~10x overhead

• Future worko Pruning false positiveso Overflow/underflow detection on floating-point instructionso NaN/Inf detection on floating-point instructions

14

Alternate-precision Alternate-precision AnalysisAnalysis

• Previous approacho Replace floating-point values with a pointero “Shadow” values allocated on heap

• Disadvantageso Major change in program semantics (copying vs. aliasing)o Lots of pointer-related bugso Required function calls and use of a garbage collectoro Large performance impact (>200-300x)o Increased memory usage (>1.5x)

15

downcast conversion

• New shadow-value analysis schemeo Narrowed focus: doubles singleso In-place downcast conversion (no heap allocations)o Flag in the high bits to indicate replacement

03264 16 8 4

Double

03264 16 8 4ReplacedDouble

7 F F 4 D E A D

Non-signalling NaN 032 16 8 4

Single

16

ContributionsContributions

ContributionsContributions• Simpler analysis

o Instrument instructions w/ double-precision operandso Check and replace operandso Replace double-precision opcodeso Fix up flags if necessary

• Streamlined instrumentationo Insert “binary blobs” of optimized machine codeo Pre-generated by mini-assembler inside mutatoro Prevents overhead of added function callso No memory overhead

17

ExampleExample

gvec[i,j] = gvec[i,j] * lvec[3] + gvar

1 movsd 0x601e38(%rax, %rbx, 8) %xmm02 mulsd -0x78(%rsp) %xmm03 addsd -0x4f02(%rip) %xmm04 movsd %xmm0 0x601e38(%rax, %rbx, 8)

18

gvec[i,j] = gvec[i,j] * lvec[3] + gvar

1 movsd 0x601e38(%rax, %rbx, 8) %xmm0check/replace -0x78(%rsp) and %xmm0

2 mulss -0x78(%rsp) %xmm0check/replace -0x4f02(%rip) and %xmm0

3 addss -0x20dd43(%rip) %xmm04 movsd %xmm0 0x601e38(%rax, %rbx, 8)

19

ExampleExample

XMM registerXMM register

ChallengesChallenges• Currently handled

o %rip- and %rsp-relative addressingo %rflags preservationo Math functions from libmo Bitwise operations (AND/OR/XOR/BTC)o Size and type conversionso Compiler optimization levelso Packed instructions

20

IEEE SingleIEEE Single IEEE SingleIEEE Single IEEE SingleIEEE Single IEEE SingleIEEE Single

03264128

downcast conversion

IEEE DoubleIEEE Double

downcast conversion


IEEE SingleIEEE SingleIEEE SingleIEEE Single0x7FF4DEAD0x7FF4DEAD 0x7FF4DEAD0x7FF4DEAD

ChallengesChallenges

• Future worko 80-bit “long double” precisiono 16-bit IEEE half-precisiono 128-bit IEEE quad-precisiono Width-dependent random number generationo Non-gcc compilerso Arcane floating-point hacks

• Sqrt: (1<<29) + (tmp >> 1) - (1<<22)• Fast InvSqrt: 0x5f3759df – (val >> 1)

21

ResultsResults• Runs correctly on Sequoia kernels and other

examples:

AMGmk 4xCrystalMk 4xIRSmk 7xUMTmk 3xLULESH 4x

• “Real” code with manageable overhead

• Future work: more optimization• Future work: run on full benchmarks

22

ConclusionConclusion• Cancellation detection

o Improved support for multi-core analysis

• Overflow detectiono New tool for logging integer overflowo Possibilities for extension and incorporation into

floating-point analysis

• Alternate-precision analysiso New “in-place” analysiso Much-improved performance and robustness

23

Future GoalsFuture Goals• Selective analysis

o Data-centric (variables or matrices)o Control-centric (basic blocks or functions)

• Analysis search spaceo Minimize precisiono Maximize accuracy

• Goal: Tool for automated floating-point precision analysis and recommendation

24

AcknowledgementsAcknowledgementsJeff Hollingsworth, University of Maryland (Advisor)

Bronis de Supinski, LLNL (Mentor)Tony Baylis, LLNL (Supervisor)

Barry Rountree, LLNLMatt Legendre, LLNL

Greg Lee, LLNLDong Ahn, LLNL

Thank you!

25

Bitfield TemplatesBitfield Templates03264 16 8 4

03264 16 8 4

032 16 8 4

Double

Single

26

XMM registerXMM register

IEEE SingleIEEE Single IEEE SingleIEEE Single IEEE SingleIEEE Single IEEE SingleIEEE Single

03264128

downcast conversion


downcast conversion


IEEE SingleIEEE SingleIEEE SingleIEEE Single0x7FF4DEAD0x7FF4DEAD 0x7FF4DEAD0x7FF4DEAD

Binary-Level Tools for Floating-Point Correctness Analysis Michael Lam LLNL Summer Intern 2011 Bronis de Supinski, Mentor.

Documents

o works

manual o

handled o

place analysis o

cancellation o log

integer instructions

future work o

source code o