Numerical-Precision- Optimized Volume Rendering Ingmar Bitter Neophytos Neophytou Klaus Mueller Arie Kaufman
Jan 11, 2016
Numerical-Precision-Optimized Volume Rendering
Ingmar Bitter Neophytos Neophytou
Klaus Mueller Arie Kaufman
Numerical-Precision-Optimized Volume Rendering
Ingmar Bitter Neophytos Neophytou
Klaus Mueller Arie Kaufman
Outline
• Numerical precision - a rendering resource
Outline
• Numerical precision - a rendering resource
• Fixed-point arithmetic
Outline
• Numerical precision - a rendering resource
• Fixed-point arithmetic
• Reverse order precision analysis
– Compositing, shading, gradients, classification, sampling/splatting, sample/splat location
Outline
• Numerical precision - a rendering resource
• Fixed-point arithmetic
• Reverse order precision analysis
– Compositing, shading, gradients, classification,sampling/splatting, sample/splat location
• Results
Outline
• Numerical precision - a rendering resource
• Fixed-point arithmetic
• Reverse order precision analysis
– Compositing, shading, gradients, classification, sampling/splatting, sample/splat location
• Results
• Conclusions
Numerical Precision: A Resource
• Double precision computation for all – ideal?
Numerical Precision: A Resource
• Double precision computation for all – ideal?
– slower then all other alternatives
– not possible on graphics cards (at least for now)
– expensive on custom chip implementations
– and most importantly:
not needed to create best possible images!!
Numerical Precision: A Resource
• Double precision computation for all – ideal?
– slower then all other alternatives
– not possible on graphics cards (at least for now)
– expensive on custom chip implementations
– and most importantly:
not needed to create best possible images!!
reasons: predominantly 8-bit displays (per channel)
limited range intervals throughout
Current Status• Stable volume rendering pipeline: both CPU and GPU
[LL94, Lev88, MJC02, Wes90, EKE01, RSEB00]
• Interpolation before classification, even for splatting [MMC99]
• Caching optimized for volume rendering[Kni00, LCCK02, PSL98]
• Precision-limited rendering systems: ATI, NVidia,VolumePro [PHK99], VizardII [MKW02], UltraVis [Kni00]
• Completely fixed: final output image display bit precision
– 8 bits per RGB color channel on CRTs and LCDs
– 8 bits max in DVI standard
– SGIs 12 bit color displays are nearly extinct
– Radiologists’ requirements are not mass market, same analysis applies
OpenGL Arithmetic: 12=1?
• Representation [0, 255] a = b = 255
• Computation = a[0, 255] × b[0, 255] >> 8;
= 254 wrong
1 mult, one shift
• Alternatively: tmp = a[0, 255] × b[0, 255] + 128; result = (tmp+(tmp >> 8)) >> 8;
= 255, correct [Bli95]
1 mult, 2 adds, 2 shifts
OpenGL Arithmetic: 12=1?
• Representation: fixed-point I.Fb
– I.Fb = I integer bits, F fraction bits
• 8 bits 1.7b fixed point number
then a = b = 11.7b = 128
• Computation = a1.7b × b1.7b >> 7
= 128 correct
1 mult, one shift
one fewer bit of resolution, but OK (we will see)
Reverse Order Precision Analysis
• Unified ray casting and splatting pipelines
• Composite creates the final image
Sample Location Splat Location
Sample Splat
Classify
Gradient
Shade
Composite
Ray Casting Splatting
Reverse Order Precision Analysis
• Unified ray casting and splatting pipelines
• Composite creates the final image
• Precision requirements propagate backwards
Sample Location Splat Location
Sample Splat
Classify
Gradient
Shade
Composite
Ray Casting Splatting
Compositing - Math
• Pre-(alpha)-multiplied colors:
– C = αC = αR, αG, αB
• Alpha correction (r samples per unit):
– Tcorrected = (1- α)r
Compositing - Math• Pre-(alpha)-multiplied colors:
– C = αC = αR, αG, αB
• Alpha correction:
– Tcorrected = (1- α)r
• With back-to-front compositing:
– CCompositingBuffer ×= Tcorrected += Cfront
– TCompositingBuffer ×= Tcorrected; αCompositingBuffer = 1-Tcorrected
– perform multiplication N times per pixel
correct solution needs N × F × r bits precision
T/CCompositingBuffer
Tcorrected, Cfront
T/CCompositingBuffer
Compositing – Precision Theory• 8-bit destination resolution
– therefore all partial results can be rounded
– drop all bits not contributing to the 8 most significant bits (MSB)
• Adding N = 2p samples – allows 8+p bits to influence the 8 MSB
• Conversion from αCompositingBufferC to C for display (division)– allows 8+p more bits to influence the 8 MSB
• Conversion from αcorrectedC to C for display– allows r times as many bits to influence the 8 MSB
• Sufficient resolution is: r × 2 × (8+p) for C, r × (8+p) for α– 32/16 bits for C/αCompositingBuffer for 2563 volumes and no super-sampling
– 608 bits for 5122×2048 volumes and 16 samples per voxel
Compositing – Precision Theory• 8-bit destination resolution
– therefore all partial results can be rounded
– drop all bits not contributing to the 8 most significant bits (MSB)
• Adding N = 2p samples – allows 8+p bits to influence the 8 MSB
• Conversion from αCompositingBufferC to C for display (division)– allows 8+p more bits to influence the 8 MSB
• Conversion from αcorrectedC to C for display– allows r times as many bits to influence the 8 MSB
• Sufficient resolution is: r × 2 × (8+p) for C, r × (8+p) for α– 32/16 bits for C/αCompositingBuffer for 2563 volumes and no super-sampling
– 608 bits for 5122×2048 volumes and 16 samples per voxel
Compositing – Precision Theory• 8-bit destination resolution
– therefore all partial results can be rounded
– drop all bits not contributing to the 8 most significant bits (MSB)
• Adding N = 2p samples – allows 8+p bits to influence the 8 MSB
• Conversion from αCompositingBufferC to C for display (division)– allows 8+p more bits to influence the 8 MSB
• Conversion from αcorrectedC to C for display– allows r times as many bits to influence the 8 MSB
• Sufficient resolution is: r × 2 × (8+p) for C, r × (8+p) for α– 32/16 bits for C/αCompositingBuffer for 2563 volumes and no super-sampling
– 608 bits for 5122×2048 volumes and 16 samples per voxel
Compositing – Precision Theory• 8-bit destination resolution
– therefore all partial results can be rounded
– drop all bits not contributing to the 8 most significant bits (MSB)
• Adding N = 2p samples – allows 8+p bits to influence the 8 MSB
• Conversion from αCompositingBufferC to C for display (division)– allows 8+p more bits to influence the 8 MSB
• Conversion from αcorrectedC to C for display– allows r times as many bits to influence the 8 MSB
• Sufficient resolution is: r × 2 × (8+p) for C, r × (8+p) for α– 32/16 bits for C/αCompositingBuffer for 2563 volumes and no super-sampling
– 608 bits for 5122×2048 volumes and 16 samples per voxel
Compositing – Precision Theory• 8-bit destination resolution
– therefore all partial results can be rounded
– drop all bits not contributing to the 8 most significant bits (MSB)
• Adding N = 2p samples – allows 8+p bits to influence the 8 MSB
• Conversion from αCompositingBufferC to C for display (division)– allows 8+p more bits to influence the 8 MSB
• Conversion from αcorrectedC to C for display– allows r times as many bits to influence the 8 MSB
• Sufficient resolution is: r × 2 × (8+p) for C, r × (8+p) for α– 32/16 bits for C/αCompositingBuffer for 2563 volumes and no super-sampling
– 608 bits for 5122×2048 volumes and 16 samples per voxel
Compositing – Precision Practice• No alpha correction (r = 1): 2 × (8+p) bits
• Iso-surface rendering using “old fashioned” OpenGL:
– store not αC but C in frame buffer: (8+p)
– bright colors: 5+p
– at most 8 non-zero samples per ray (p=3): 5+3=8 bits
standard 24 bit RGBA frame buffer is adequate
• Fog visualization
– what matters is the ability to see objects though volumetric fog (substance with low opacity)
– visual experiments show 15 fractional bits are sufficient
Compositing – Precision Practice• No alpha correction (r = 1): 2 × (8+p) bits
• Iso-surface rendering using “old fashioned” OpenGL:
– store not αC but C in frame buffer: (8+p)
– bright colors: 5+p
– at most 8 non-zero samples per ray (p=3): 5+3=8 bits
standard 24 bit RGBA frame buffer is adequate
• Fog visualization
– what matters is the ability to see objects though volumetric fog (substance with low opacity)
– visual experiments show 15 fractional bits are sufficient
Compositing – Precision Practice• No alpha correction (r = 1): 2 × (8+p) bits
• Iso-surface rendering using “old fashioned” OpenGL:
– store not αC but C in frame buffer: (8+p)
– bright colors: 5+p
– at most 8 non-zero samples per ray (p=3): 5+3=8 bits
standard 24 bit RGBA frame buffer is adequate
• Fog visualization
– what matters is the ability to see objects though volumetric fog (substance with low opacity)
– visual experiments show 15 fractional bits are sufficient
Compositing – Precision Practice• No alpha correction (r = 1): 2 × (8+p) bits
• Iso-surface rendering using “old fashioned” OpenGL:
– store not αC but C in frame buffer: (8+p)
– bright colors: 5+p
– at most 8 non-zero samples per ray (p=3): 5+3=8 bits
standard 24 bit RGBA frame buffer is adequate
• Fog visualization
– what matters is the ability to see objects though volumetric fog (substance with low opacity)
– visual experiments show 15 fractional bits are sufficient
Compositing – Precision Practice• No alpha correction (r = 1): 2 × (8+p) bits
• Iso-surface rendering using “old fashioned” OpenGL:
– store not αC but C in frame buffer: (8+p)
– bright colors: 5+p
– at most 8 non-zero samples per ray (p=3): 5+3=8 bits
standard 24 bit RGBA frame buffer is adequate
• Fog visualization
– what matters is the ability to see objects though volumetric fog (substance with low opacity)
– visual experiments show 15 fractional bits are sufficient
Compositing – Precision Practice• No alpha correction (r = 1): 2 × (8+p) bits
• Iso-surface rendering using “old fashioned” OpenGL:
– store not αC but C in frame buffer: (8+p)
– bright colors: 5+p
– at most 8 non-zero samples per ray (p=3): 5+3=8 bits
standard 24 bit RGBA frame buffer is adequate
• Fog visualization
– what matters is the ability to see objects though volumetric fog (substance with low opacity)
– visual experiments show 15 fractional bits are sufficient
Compositing – Conclusion
• Preferred bit-aware back-to-front compositing equations:
• αC1.15b ×= T1.15bsample += C1.15b
sample
• T1.15b ×= T1.15bsample
8 10 12 14 15 16
Least-significant-bit-fog at various bit precisions
5123 datasetr = 2
Shading - Math
• PhongCcolor = kambient OobjectColor IlightIntensity
+ kdiffuse O Σi{ Ii (N•Li) }
+ kspecular Σi{ Ii (R•Li)r }
• k є [0,1] kambient + kdiffuse + kspecular =1
• OobjectColor (8 bit) and IlightIntensity є [0,1]
• N•Li and R•Li є [-1,1], but є [0,1] after clamping
• PhongCcolor = є [0,1] (possibly clamping Σi)
Shading - Analysis
• PhongCcolor needs to be as precise as 1.15b
• Use 16.16b for all multiplications [0,1)× [0,1]
– sufficient precision and no overflow
Shading – New Computation• Replace specular
exponentiation with recursive multiplies
– repeatedly multiply number with itself
– works for all exponents r=2n
– when r=26 (16 bit precision), then max error < 0.005%
– better results than Knittel’s parabola approximation
Shading – New Computation• Replace specular
exponentiation with recursive multiplies
– repeatedly multiply number with itself
– works for all exponents r=2n
– when r=26 (16 bit precision), then max error < 0.005%
– better results than Knittel’s parabola approximation
pow
r=2n
Knittel’sparabola
Shading - Conclusion
• Preferred bit-aware Phong shading equation:
C16.16b = k16.16bambient O0.8b
objectColor I16.16blight
+ k16.16bdiffuseO0.8b Σi{ I16.16b
i (N16.16b•L16.16bi) }
+ k16.16bspecular Σi{ I16.16b
i (R16.16b•L16.16bi)2^n }
Gradients - Math
• Gx = 0.5 sample(x+1,y,z) - 0.5 sample(x-1,y,z)
• Gy = 0.5 sample(x,y+1,z) - 0.5 sample(x,y-1,z)
• Gy = 0.5 sample(x,y,z+1) - 0.5 sample(x,y,z-1)
Gradients - Analysis
• G = G1.Fb
• Discrete nearest gradient vector neighbors
– sin φ = 1/2F, sin φ ≈ φ → φ ≈ 1/2F
• Maximum error for specular intensity, large r
– r = 64, 164 != 1, but 164 = (1- 1/2F)64
– error of 22%, 6.1%, 1.6%, 0.4%for F of 8, 10, 12, 14
φ
Gradients - Analysis
• 5123-sized spheres with Phong highlights
• 4, 6, 8, 10, 12, 14 bit gradients
• Diffuse artifacts for 4 and 6 bits
• Specular artifacts up to 10 bits
1210 14
64 8
1210 14
Gradients - Conclusion• Thus, 12 bits dynamic range is needed
• Now consider normalization:
– reduces I.Fb to 1.Fb
– up to I bits will be added to the fractional part
• Volume samples often have 12 bits
• Gx,y,z with 12.12b minimum representation
• Gx,y,z with 16.16b preferred representation
– leaves room for interpolation bits in normalization
Classification – Prelims and Recaps
• Use of T instead of α is more efficient in compositing operation
• Largest visual precision/quantization error occurs at high transparencies (low opacities)
– need more bits for T than for C, just to be sure
• Want transfer function lookup table to be cache-friendly
– power-of-2 RGBA-tuple alignment
• Would like to use pre-integrated classification for color and opacity transfer functions [EKE01, MGS02]
Classification – Prelims and Recaps
• Use of T instead of α is more efficient in compositing operation
• Largest visual precision/quantization error occurs at high transparencies (low opacities)
– need more bits for T than for C, just to be sure
• Want transfer function lookup table to be cache-friendly
– power-of-2 RGBA-tuple alignment
• Would like to use pre-integrated classification for color and opacity transfer functions [EKE01, MGS02]
Classification – Prelims and Recaps
• Use of T instead of α is more efficient in compositing operation
• Largest visual precision/quantization error occurs at high transparencies (low opacities)
– need more bits for T than for C, just to be sure
• Want transfer function lookup table to be cache-friendly
– power-of-2 RGBA-tuple alignment
• Would like to use pre-integrated classification for color and opacity transfer functions [EKE01, MGS02]
Classification – Prelims and Recaps
• Use of T instead of α is more efficient in compositing operation
• Largest visual precision/quantization error occurs at high transparencies (low opacities)
– need more bits for T than for C, just to be sure
• Want transfer function lookup table to be cache-friendly
– power-of-2 RGBA-tuple alignment
• Would like to use pre-integrated classification for color and opacity transfer functions [EKE01, MGS02]
Classification - Math
• Desired lookup table entries:
R1.8bG1.8bB1.8bT1.16b 5.5 bytes
• Common lookup table entries:
R0.8bG0.8bB0.8bα0.8b 4 bytes
Classification - Math• Desired lookup table entries:
R1.8bG1.8bB1.8bT1.16b 5.5 bytes
• Common lookup table entries:
R0.8bG0.8bB0.8bα0.8b 4 bytes
• Better lookup table entries:
R0.8bG0.8bB0.8bsqrt(α)0.8b spreads low α
• Computed lookup after T = 1-(sqrt(α)2):
R0.8bG0.8bB0.8bT1.16b squaring doubles precision
Classification - Conclusion
• Preferred bit-aware lookup table entries: R0.8bG0.8bB0.8bsqrt(α)0.8b
Foot with least-significant-thin-tissue-fog
α0.8b sqrt(α)0.8b α0.16b
Sample Interpolation - Math
• sample = voxel0 × (1-w) + voxel1 × w
• sample = w × (voxel1 - voxel0) + voxel0
• Requirements:
– Gx,y,z, derived from samples, need 12 bit dynamic range
– samples need 12 bit values for transfer function lookup
– cover both low and high dynamic range neighborhoods
• Therefore, sample12.12b is a minimum requirement
– integer part comes from voxels voxel12.0b
– fractional part comes from interpolation w1.12b
Sample Interpolation - Conclusion
• Preferred bit-aware sample interpolation:
sample12.12b = w1.12b × (voxel112.0b - voxel012.0b) + voxel012.0b
• Splats start on voxels, need no interpolation:
splat12.0b = voxel12.0b
Sample Location - Math
• k-th sample location = startPos + Σk Vinc
• Perspective rays need to differ enough to allow 1024 rays across 60 degrees, or 0.05◦
– sin φ = (k 1/2F) / k, sin φ ≈ φ → φ ≈ 1/2F
– F = 6, 12, 16 → φ = 0.9◦, 0.05◦, 0.0009◦
• Also, need to address 2048 slices (integer positions) → 11bits
• Thus, need overall 11.12b
φ
k
Sample Location - Conclusion
• Preferred bit-aware sample location:
– perspective projection:
sampleLocation11.12b = startPos11.12b + Σ Vinc1.12b
– parallel projection: sampleLocation11.6b (0.9◦ OK)
Splat Scan Conversion - Math
• Splats project onto image grid → reverse rays
• Allow as many as 2048 splat rays across 60 degrees, or 0.025◦
• Hence, twice the ray casting precision
– one extra fractional bit F=13
• Also address 2048 slices (11bits)
• Thus, need overall 11.13b
φ
Splat Scan Conversion - Conclusion
• Preferred bit-aware splat scan conversion:splatLocation11.13b = startVoxelPos11.13b + Σ Vinc
1.13b
• Splats are usually pre-transformed and stored in bucket lists (one per sheet-buffer)
• Preferred voxel location sheet buffer formatx11.13b u8.0b y11.13b v8.0b (64 bits total)
– x, y: location on splat plane
– u: index into pre-integrated splat table
– v: voxel value (x, y) y)
u
Results• Summary of minimum precision requirements
Rendering Stage Input Output
Sample locations N/A 11.12b
Sample interpolation 12.00b 12.12b
Classification 12.00b 4× 0.8b
Gradients 12.12b 1.12b
Shading 1.12b 1.15b
Compositing 1.15b 1.15b
Results• Restricted iso-surface rendering:
– texture map volume rendering can be done using plain OpenGL or Direct X and 8 bit frame buffers
• General volume rendering, all pipeline stages:
– 32 bit single precision floating point format
– 16.16b fixed point format (up to 4x faster in our tests)
Pentium allows 2 simple 32-bit integer ops per clock cycle
Conclusions• 8 bits per RGB channel on final display
• Analysis of requirements by back propagation
• Sufficient precision computations using
– either 32 bits single precision floating point format
– or 16.16b fixed point format
• Voxel location sheet buffer x11.13bu8.0by11.13bv8.0b
• Transfer functions stored as R0.8bG0.8bB0.8bsqrt(α)0.8b
• Compositing/fragment buffer R1.15bG1.15bB1.15bT1.15b
Acknowledgements• Hewlett Packard Laboratories
• ONR grant N000140110034
• NSF CAREER grant ACI-0093157
• DOE grant MO-068
• Thanks to Tom Malzbender and Michael Meissner for technical discussions.
• Thanks to Ronald Summers for resources.