Definitions
CPU (Central Processing Unit) – general purpose processor able to execute computer programs.
GPU (Graphics Processing Unit) - dedicated graphics rendering device.
Price and Performance
The nVIDIA GeForce 6800 Ultra is able to reach a performance of 40 Gflops whereas an Intel 3GHz Pentium4 is able to reach only 6. [1]
What is more impressive, current cards such as ATI HD5870, AMD FireStream 9250, NVIDIA GeForce 9800 run between 1 and 3 TFLOPS.
Reasons for this include highly parallel vector processing, fast onboard memory, and pipeline constraints which stream data without stalls.
Evolution
GPU performance has approximately doubled every 6 months since the mid-1990s.
CPU performance doubles every 18 months on average (Moore’s law).
Alternative applications
New trends are showing GPU use in scientific computing using data-parallel algorithms. Examples include:
Image Stitching
Fast seamless stitching and tone-mapping of gigapixel images. (~1 hour on a notebook PC)
Key differences
TYPICAL GPU
Ordered sequence of rendering steps. Fixed hardware dedicated to each step.
LARABEE
Runs most of its pipeline in software running on multiple general purpose x86 cores.
This allows the rendering pipeline to be reconfigured dynamically. Hence, we are able to skip steps or allocate extra resources when required.
Larrabee CPU Core
The Larrabee core is “derived” from the Pentium processor.
1 scalar unit for single operations and 1 vector unit for multiple operations.
32KB L1 data and instruction cache.
256 KB L2 cache which share a ring network.
Details
8KB L1 cache is 4 times larger than original Pentium.
This is due to the fact that each core is able to perform four-way multithreading to reduce thread switching overhead. (Not to be confused with simultaneous multithreading.)
The 256KB L2 cache share a ring network. If a core is unable to find data in its own L2 cache, it places a request on a ring bus/network and will eventually find the data in its L2.
Uses a rendering technique called binning, which divides the screen into regions, and renders polygons accordingly.
Benefits of Larrabee
Game physicsReal-time ray tracingImage and video processingPhysical simulationExtended rendering capabilities
References
[1] Zhe Fan, Feng Qiu, Kaufman A., Yoakum-Stover S. GPU Cluster for High Performance Computing. 2004. ACM / IEEE Supercomputing Conference 2004, November 06-12, Pittsburgh, PA.
[2] L. Seiler et al. 2008. Larrabee: A Many-Core x86 Architecture for Visual Computing. ACM Transactions on Graphics, vl. 27, n. 3, Article 18, August 2008.