Results Model Input patches Output grids Split time Dice time Rendering performance Teapot 32 4,823 2.69 ms 1.27 ms 60.1 fps Killeroo 11,532 14,426 6.30 ms 3.46 ms 29.7 fps 0 10 20 30 40 50 0 10 20 30 40 50 60 70 80 90 100 0 128 256 384 512 Memory Usage (MB) Subdivision Time (ms) Bucket width (pixels) 0 5 10 15 20 25 30 0 5000 10000 15000 20000 Time (ms) Number of grids Interactive Subdivision of Smooth Surfaces on GPUs Anjul Patney, Mohamed S. Ebeida*, John D. Owens University of California, Davis, *Carnegie Mellon University Parametric surfaces Subdivision Surfaces Abstract We present a strategy for performing view-adaptive, crack-free tessellation of Catmull- Clark subdivision surfaces entirely on programmable graphics hardware. Our scheme extends the concept of breadth-first subdivision, which up to this point has only been applied to parametric patches. While mesh representations designed for a CPU often involve pointer-based structures and irregular per-element storage, neither of these is well-suited to GPU execution. To solve this problem, we use a simple yet effective data structure for representing a subdivision mesh, and design a careful algorithm to update the mesh in a completely parallel manner. We demonstrate that in spite of the complexities of the subdivision procedure, real-time tessellation to pixel-sized primitives can be done. Our implementation does not rely on any approximation of the limit surface, and avoids both subdivision cracks and T-junctions in the subdivided mesh. Using the approach in this paper, we are able to perform real-time subdivision for several static as well as animated models. Rendering performance is scalable for increasingly complex models. Abstract We present a GPU based implementation of Reyes-style adaptive surface subdivision, known in Reyes terminology as the Bound/Split and Dice stages. The performance of this task is important for the Reyes pipeline to map efficiently to graphics hardware, but its recursive nature and irregular and unbounded memory requirements present a challenge to an efficient implementation. Our solution begins by characterizing Reyes subdivision as a work queue with irregular computation, targeted to a massively parallel GPU. We propose efficient solutions to these general problems by casting our solution in terms of the fundamental primitives of prefix-sum and reduction, often encountered in parallel and GPGPU environments. Our results indicate that real-time Reyes subdivision can indeed be obtained on today's GPUs. We are able to subdivide a complex model to subpixel accuracy within 15 ms. Our measured performance is several times better than that of Pixar's RenderMan. Our implementation scales well with the input size and depth of subdivision. We also address concerns of memory size and bandwidth, and analyze the feasibility of conventional ideas on screen-space buckets. Evaluate screen- space bound Split Is bound > threshold? Dice Yes No Input surface Output micropolygons Split surface Algorithm 1 2 3 4 5 1 2 4 5 3 Breadth-first Subdivision Summary Dynamic Work-Queue A B C D E F G H I A B C E F H A C E A B C E F H A C E Fast scan-based compaction (Sengupta ‘07) Coalesced Scatter • Recursive Subdivision in real-time – Breadth-first formulation – Maps well to GPUs • First step towards a real-time Reyes pipeline • Resolving subdivision cracks • Displacement mapping Future Work Conclusion Approach Summary Results Fixing Cracks Face Vertex Edge Face v 0 , v 1 , v 2 , v 3 Face-point Edge v 0 , v 1 , f 0 , f 1 Edge-point Vertex x, y, z, val. Vertex-point Face v 0 , v 1 , v 2 , v 3 Face-point Edge v 0 , v 1 , f 0 , f 1 Edge-point Vertex x, y, z, val . Vertex-point Face v 0 , v 1 , v 2 , v 3 Face-point Edge v 0 , v 1 , f 0 , f 1 Edge-point Vertex x, y, z, val . Vertex-point + + Maintain three arrays: Vertices, Faces, and Edges Cracks due to view-adaptive subdivision are fixed entirely in parallel Rendering time grows linearly with scene complexity Performance grows with complexity, and settles at 2-2.5 Mfaces/sec • Parallel GPU tessellation of Catmull-Clark surfaces • Robust data management for subdivision • Dynamic view-dependence • Fixing cracks in parallel Future Work Conclusion • Efficient memory management • Programmable geometry caching Atomic add Atomic add Model Input faces Output faces Rendering time Big Guy 1,450 91,992 37.12 ms Monster Frog 1,292 80,452 34.17 ms Killeroo 2,894 80,227 31.34 ms