IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS ... · PDF fileIEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. X, NO. X, ... triangle assignment ... Solution

IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. X, NO. X, MONTH 20XX 1

Multi-Depth-Map Raytracing forEfficient Large-Scene Reconstruction

Murat Arikan, Reinhold Preiner and Michael Wimmer

Abstract—With the enormous advances of the acquisition technology over the last years, fast processing and high-qualityvisualization of large point clouds have gained increasing attention. Commonly, a mesh surface is reconstructed from the pointcloud and a high-resolution texture is generated over the mesh from the images taken at the site to represent surface materials.However, this global reconstruction and texturing approach becomes impractical with increasing data sizes. Recently, due to itspotential for scalability and extensibility, a method for texturing a set of depth maps in a preprocessing and stitching them atruntime has been proposed to represent large scenes. However, the rendering performance of this method is strongly dependenton the number of depth maps and their resolution. Moreover, for the proposed scene representation, every single depth maphas to be textured by the images, which in practice heavily increases processing costs. In this paper, we present a novel methodto break these dependencies by introducing an efficient raytracing of multiple depth maps. In a preprocessing phase, we firstgenerate high-resolution textured depth maps by rendering the input points from image cameras and then perform a graph-cut based optimization to assign a small subset of these points to the images. At runtime, we use the resulting point-to-imageassignments (1) to identify for each view ray which depth map contains the closest ray-surface intersection and (2) to efficientlycompute this intersection point. The resulting algorithm accelerates both the texturing and the rendering of the depth maps byan order of magnitude.

Index Terms—Point-based rendering, raytracing depth maps, large-scale models

F

1 INTRODUCTION

THe high-quality reconstruction and visualizationof large scenes from huge amounts of raw sensor

data is an important and particularly challenging taskin many application areas, ranging from digitizationand preservation of cultural heritage, over virtual real-ity and games, to planning and visualization for archi-tecture and industry. To virtually recreate such scenes,geometry is reconstructed from scanned 3D point-cloud data and commonly textured from registeredhigh-resolution photographs taken at the original site.

In practice, computing a high-quality texturingfrom such images is a non-trivial task due to imageoverlaps, varying lighting conditions, different sam-pling rates and image misregistrations. One potentialworkflow represents the geometry as a point cloudagain and directly texture-maps the resulting point-based surface [1], [2]. However, this approach canexhibit visible artifacts like illumination seams andtexture misalignments, which heavily degenerate thevisual quality of the result. A more common approachis to convert the point data into a mesh once [3], [4]and then render the scene as a textured mesh, re-ducing both memory and bandwidth consumption. Inorder to obtain the required texturing, an image-to-triangle assignment (also called labeling) problem has

• M. Arikan, R. Preiner and M. Wimmer are with the Institute ofComputer Graphics and Algorithms, Vienna University of Technology,Austria.E-mail: [email protected]

to be solved, for which state-of-the-art methods [5], [6]use a graph-cut based optimization, which providesa homogeneous and high-quality solution. In large-scale scenarios, this is done once in an expensivepreprocessing phase, and the resulting textured meshis then used for efficient rendering. However, thesemethods are not very flexible – any change or additionin the geometry or image data requires an expensiverelabeling of the mesh – and do not scale well due tothe time complexity of the global labeling. Moreover,large-scale scenarios require an out-of-core computa-tion of the mesh [7] and its texturing, imposing anadditional maintenance overhead.

State of the art: To break down the problem com-plexity and accelerate the reconstruction and labelingpreprocessing, Arikan et al. [8] introduced a local-ized textured surface reconstruction and visualiza-tion approach. They employ a set of Textured DepthMaps to represent the scene as a collection of surfacepatches, avoiding the reconstruction and maintenanceof the whole surface and significantly reducing theoptimization costs by labeling only a set of smalldepth maps instead of a large out-of-core mesh. Thesepatches are triangulated and stitched at runtime, trad-ing a minor increase in rendering time against a hugedecrease in preprocessing time. Moreover, the patch-based representation offers both more flexibility andbetter scalability, since new patches can be addedand textured easily without recomputing the wholesurface. However, the rendering performance heavilydepends on the number of depth maps and their reso-lution. This introduces a natural bound on the depth-

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/TVCG.2015.2430333

Copyright (c) 2015 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].


(c) state of the art (d) our method

labeling in 7 min, 45 fps labeling in 87 min, 5 fps labeling in 3.4 min, 43fps

(a)

(b)

Fig. 1. (a) Point-cloud and image data acquired by a scanner. The data set consists of 682M points and 192images. (b) Scene overview rendered by our method. (c) and (d) compare the state of the art [8] and our methodin terms of performance and quality. The previous approach has to settle with a significantly lower geometricresolution in order to reach the performance of our new method.

map resolution usable to be rendered interactively,thus limiting the achievable geometric quality in therendered image.

Solution approach: We introduce an output-sensitivevisualization technique of such a patch-based surfacerepresentation. Instead of stitching high-resolutiondepth maps, which is expensive, we perform a multi-depth-map raytracing approach, which efficiently iden-tifies for each view ray the depth map that containsthe closest valid ray-surface intersection, and thenfinds this intersection point. Our method also avoidsthe labeling of every single depth map in the prepro-cessing, but instead labels a strongly reduced subset ofthe original point cloud, which in practice acceleratesthe labeling process by over an order of magnitude.To obtain high-quality per-pixel labels for texturing,this coarse point set is projected to the screen and itslabels are upsampled using a geometry-aware Voronoidecomposition of the depth buffer at runtime.

As our main contribution over the state of theart, we propose a novel raytracing approach whoseperformance is independent of the number and res-olution of the depth maps, therefore allowing for ahigh-quality real-time visualization of large scenes atmuch higher geometric resolution than the previousapproach [8] (Fig. 1).

2 RELATED WORK

The problem of textured scene reconstruction and vi-sualization from large point clouds and photographshas been addressed by several authors.

Point-based rendering techniques like surface splat-ting [9], [10], [11], [12] render the input points aselliptical surface primitives (splats), which are blendedto obtain a smooth continuous surface. These methodshave been coupled with texturing [1], [2] to obtain atextured point-based visualization of a scene. Texturemapping point-based surfaces avoids a costly large-scale mesh reconstruction, but does not produce op-timal point-to-texture assignments. This can producevisible artifacts like texture misalignments and illumi-nation seams.

Mesh-based textured reconstruction techniquesachieve a continuous high-quality texturing of thescene by performing a global, graph-cut based op-timization of the triangle-to-texture assignments ona single huge mesh [5], [6]. These methods producehigh-quality visualizations of large scenes, but requirea time-expensive preprocessing for the mesh recon-struction and labeling as well as a large maintenanceoverhead, making it inflexible to changes and exten-sions in the data set.

Therefore, Arikan et al. [8] recently proposed apatch-based reconstruction approach, which breaksdown the meshing and labeling complexity by repre-senting the scene by several surface patches, allowingfor both a more efficient preprocessing and a moreflexible and scalable data management. Their methodgenerates a set of textured depth maps in a prepro-cessing and stitches them at runtime, which stronglycouples the rendering performance with the numberand resolution of these depth maps.




Our method builds upon this localized approachfor the data representation, but alleviates its perfor-mance limitations by introducing a solution for anefficient raytracing of multiple depth maps. Findingray intersections with surfaces represented by two-dimensional range maps has various applications, likerendering soft shadows [13], [14] or reflections [15].

Previous methods for raytracing large-scale scenesdepend on the use of spatial acceleration data struc-tures. Reshetov et al. [16] employs a spatial kd-tree todetect scene parts that are guaranteed not to intersectwith a collection of view rays. Agrawala et al. [13]proposed a hierarchical ray traversal to skip overlarge sections of a ray that cannot possibly intersectthe scene. Xie et al. [14] raytraces a multi-layer depthmap to reduce shadowing artifacts. To cope withthe additional overhead of searching an intersectionpoint in multiple layers, they introduced a hierarchicalintersection test against a quadtree, where each nodecontains the minimum and maximum depth value ofthe four child nodes in the layer below. In contrast, weuse multiple single-layer depth maps covering a sceneand employ a labeled coarse subset of the originalpoint cloud to directly determine the depth map thatis first intersected by a view ray. This is done bysplatting the label information of these points into thescreen, and upsampling their labels to obtain per-pixellabels. The resulting label of a pixel then indicates thedepth map to be intersected by the pixel’s correspond-ing view ray.

In the following, we give an overview of our pre-processing and rendering pipeline, and then describeeach step of our reconstruction and texturing systemin detail.

3 OVERVIEWOur method takes as input a high-density 3D pointcloud (denoted by PHD), for example from a laserscanner, and a set of high-resolution photographs {Ij}with known camera registrations. We propose a two-phase solution for an efficient high-quality visualiza-tion of the data.

In the preprocessing phase, we generate high-resolution depth maps by rendering the input pointcloud PHD from image cameras (Fig. 2a, Section 4.1),and compute an image-to-point assignment (referredto as labeling) only for a small subset PLD ⊆ PHD

(Fig. 2b, Sections 4.2 and 4.3), which we will call proxypoints.

At runtime, we reconstruct a high-resolution depthbuffer, which stores depth values of the scene asviewed from the user’s camera. This is done byfirst splatting proxy points, and then raytracing theprecomputed depth maps, starting from coarse splatpositions (Fig. 2c, Section 5.1). In a second step, thelabels of PLD are used to obtain an upsampled depth-buffer labeling, which is required for texturing thefinal output image (Section 5.2).

4 PREPROCESSING

4.1 Generating the Depth Maps

For each image Ii, we generate a depth map Di byrendering the original point cloud PHD from the sameviewpoint and with the same viewing parametersas Ii. For rendering, we use oriented circular splatsas rendering primitives and employ an out-of-coreoctree data structure [17] to store PHD and streamvisible points to the GPU. If point normals are notavailable, we compute them by fitting a least-squareplane to a neighborhood of each point. The splatradii are determined from the density of the renderedpoints [17].

4.2 Generating the Proxy Points

The proxy points PLD are obtained by sub-samplingPHD. To this end, the octree storing PHD is pruned tocontain only its k top-most levels, which correspondto the k lowest levels of detail of PHD. As we willshow in Section 6.1, the choice of k is a trade-offbetween performance and rendering quality. We willalso demonstrate that using only a small subset ofthe original point cloud as proxy points stronglyaccelerates the subsequent labeling stage, but is stillsufficient for a high-quality textured reconstructionfrom the depth maps at render time.

4.3 Labeling

To obtain a point-to-image assignment, first a set ofcandidate images of each point p ∈ PLD is deter-mined. The image Ii is a candidate of p if p is notoccluded from the camera view of Ii. In the secondstep, we pick for each point p its best-suited candidateimage Ij for texturing, i.e., p is labeled with the indexj.

This assignment has to consider the quality of theimage-to-geometry mapping as well as continuity inthe texturing (i.e., avoiding visible artifacts betweenareas labeled by different images). We solve thisproblem by a graph-cut based optimization, wherethe quality and continuity criteria are addressed bya data and a smoothness term, respectively. However,instead of operating on triangles as done in previousapproaches, we use the knn-graph built upon thepoints as input graph for the optimization. We use thesame data and smoothness term as in Arikan et al. [8]:For the points, the data term favors orthogonal andclose image views. In contrast, the smoothness termpenalizes label changes with strong color differencesalong edges between neighboring points.

5 MULTI-DEPTH-MAP RAYTRACING

In this section, we describe how the precomputed datais used at runtime to obtain a high-quality visualiza-tion of the scene. We perform two major steps, surface




Ii

DjPHD

Ij

Di

(a) Depth-map generation

PLDPLD

(b) Point-cloud labeling (c) Rendering

Fig. 2. Overview of our pipeline. (a) High-resolution depth maps are generated by rendering the high-densityinput point cloud PHD from image cameras. The depth maps Di and Dj , lifted to 3D, are color-coded by theircorresponding images Ii and Ij , respectively. (b) PHD is subsampled, and the resulting low-density point cloudPLD is labeled by the input images, i.e., each point of PLD is assigned to an input image. This concludes thepreprocessing phase. (c) Coarse surface positions (marked with 4) that are equipped with labels are efficientlyobtained by splatting points of PLD. Then, starting from these positions, raytracing the respective depth mapsyields high-resolution surface positions (marked with©).

(a) (b)

(c) (e)(d)

2 la

yers

1 la

yer

1 la

yer

2 la

yers

Fig. 3. (a)-(d) Rendering pipeline. (a) Splatting proxy points PLD (color coded according to labels). (b) Raytracinghigh-resolution depth maps. (c) Per-pixel labeling to be used for texturing. (d) Textured and shaded surface. (e)shows invalid intersections with discontinuity triangles that can occur when raytracing a single depth-map layeralong each view ray.

generation and color mapping, to render a texturedsurface.

The surface-generation step first renders PLD assplats to create a depth buffer representing coarsesurface positions and a corresponding label buffer(Fig. 3a, Section 5.1.1). For rendering, we employ thesame out-of-core data structure [17] that we used togenerate the depth maps. Then, starting from thesecoarse positions, for each pixel the depth map indi-cated by the label buffer is raytraced in a full-screenrendering pass to produce a high-resolution depthbuffer (Fig. 3b, Section 5.1.2).

The following color-mapping step splats PLD againto generate a high-resolution label buffer by upsam-

pling the labels that were output in the first pass(Fig. 3c, Section 5.2.1).

Finally, high-resolution images relevant for textur-ing are cached to the GPU (Section 5.2.2), and thecolor of each pixel is retrieved in a full-screen passby projecting it onto its assigned image based on thedepth and label retrieved from the high-resolutiondepth and label buffers (Fig. 3d, Section 5.2.3).

In the following, we will describe the individualsteps of our rendering pipeline in more detail.




Ii

Dj

Ij

Di

depth-map discontinuity

silhouette

Fig. 4. Multiple label layers for raytracing. Invalidintersections (©, green) can be caused by initializingraytracing with the front-most splat position (4, green)and its label i. In this case, starting from second-layerpositions (4, red) with label j, raytracing Dj producesvalid intersection points (©, red).

5.1 Surface Generation5.1.1 Visibility StageIn the first pass, PLD is rendered with z-buffering,writing to a depth buffer Bd and a label buffer Bl. Thegenerated buffers represent the front-most label layer,which will be used in the raytracing pass to computethe intersection points of view rays with depth maps.In particular, a ray cast from the viewpoint throughthe pixel position p = (xp, yp) will intersect thedepth map indexed by label lp = Bl(xp, yp), and theintersection search will start at the 3D position q0

p

corresponding to the depth value dp = Bd(xp, yp).This fast, direct selection technique gives the correct

depth map for the vast majority of the view rays in thescreen. However, in some cases, the labels in Bl willnot correspond to a depth map that contains a validray intersection. This mostly happens for proxy pointssplatted very close to depth-map discontinuities andsilhouettes (Figs. 3e and 4). In such a case, we retrievethe depth-map label for the intersection test from thenext closer proxy point splat along the ray with adifferent label. For this, we have to store a second labellayer to look up the next depth map for raytracing ifno valid intersection point is found in the first depthmap (Figs. 3e and 4). To extract this second label layer,PLD is rendered again with z-buffering, and at eachpixel p, fragments with label lp or depth values lessthan dp are discarded. The resulting depth and labelvalues are written into two additional buffers. We thenextend this approach to multiple layers computed ina depth-peeling fashion [18].

5.1.2 Raytracing PassWe render a full-screen quad and perform for eachscreen-space pixel p an iterative search in the high-resolution depth map Dlp , followed by a binary

qp

1Dlp

Ilp

qp

0

rp

h0

Fig. 5. A single iteration of the iterative search, taking astep of h0 on rp. The start position q0

p and its label indexlp are retrieved from the closest ray-splat intersection.

search. The iterative search starts at q0p and uses a

stepsize that adapts to the current estimated distanceto the intersection. The next point on the ray iscomputed as follows:

qip = qi−1

p + hi−1 ∗ rp, (1)

where rp is the normalized ray direction. The adaptivestepsize hi−1 is calculated as the signed distance ofqi−1p to Dlp along the line to the center of projection

of Ilp (Fig. 5). The distance is signed since the low-resolution depth-buffer value used as initializationcan lie in front or behind the high-resolution depthmap.

Since q0p provides a sufficiently good initialization,

only a few iterations are required (except at obliqueangles) to find a pair of points qk−1

p and qkp enclosing

an intersection. In a second step, the interval [qk−1p ,qk

p]is refined by a binary search to find a more accurateapproximation qp of the intersection point.

We then check whether qp lies on a depth discon-tinuity of Dlp . For this, we detect the four texels ofDlp (yielding two triangles in 3D) that are nearest tothe projection of qp into Dlp , and assume a disconti-nuity if the depth disparity between any two trianglevertices is above a user-defined threshold (20cm inour examples). Averaging the two triangle normalsalso provides us with per-pixel normals, which canbe optionally used for lighting effects. In case of adepth discontinuity, raytracing is re-performed to findan intersection with the depth map retrieved from thenext label layer (Fig. 4).

The results of the raytracing pass basically refinefor each pixel the depth value and – in case of adiscontinuity – the label value originally obtainedfrom splatting PLD.

5.2 Color Mapping5.2.1 Labeling PassThe aim of this rendering pass is to equip the high-resolution depth data from the previous pass withlabels that are suitable for texturing. Unfortunately,we cannot use the label buffer Bl created in thevisibility stage as is, since due to the low resolution ofPLD, this buffer exhibits non-regular borders between




Dj Di

Fig. 6. Illustration of false labels near silhouettes.View rays through the splat at the silhouette have validintersections with Di. Therefore, this splat projects itslabel i to the background, causing corresponding pixelsof that region to be assigned the label i instead of j.

differently labeled regions (Fig. 3b) and false labelsnear silhouettes (Figs. 6 and 8c).

Instead, we compute a Voronoi decomposition ofthe screen space into equally labeled regions. Theseeds of this decomposition are specified by the pro-jection of the points cj ∈ PLD into screen space,and distances between pixels and seed points aremeasured by the Euclidean distances of the respectivepoints qp and cj in 3D. This way, each pixel will beassigned the label of its closest seed cj . This resultsin a high-resolution label buffer with per-pixel labelsupsampled from the sparse labeling information inPLD.

In practice, this is implemented by rendering PLD

as splats using z-buffering, with the depth value ofa splat at pixel p manually set to the 3D Euclideandistance d(cj , qp) between the splat center cj andthe point qp. This pass stores at each pixel p (corre-sponding to the surface point qp) the label of cj withj = argminj d(cj , qp).

5.2.2 Image ManagementIn this step, we employ an out-of-core streaming tech-nique [8] for continuously caching the currently mostrelevant images into a GPU texture array, where therelevance of an image is measured by the frequencyof occurrence of its label in the updated label buffer.

5.2.3 Texturing PassA full-screen quad is rendered to retrieve the color ofeach pixel p by projecting qp onto the image indicatedby the updated label buffer.

In a last step, we perform an online screen-spaceleveling method [8] to balance the color intensitiesbetween regions textured by different photographsand thus reduce illumination seams in the final outputimage.

6 RESULTSWe have tested our approach on three different datasets acquired by a laser scanner (Table 1, Fig. 17).

TABLE 1Scene characteristics.

Model # Points # ImagesHanghaus 2 Wohneinheit 6 (Scene 1) 35M 188

Hanghaus 2 Wohneinheit 1 (Scene 2) 682M 192

Centcelles (Scene 3) 1091M 161

Scene 1 and 2 are scans of different building unitsin terrace house (Hanghaus) 2 in the excavation ofancient Ephesus, while Scene 3 is a scan of the cupolaof the Roman villa of Centcelles. In the following, wediscuss performance and quality tradeoffs depend-ing on the algorithm’s main parameters, and givea detailed analysis of memory consumption, recon-struction error compared to ground truth, and theconvergence of the iterative search. Then, we willcompare our approach (denoted by DMRT) to therelated depth-map triangulation approach (denotedby DMT) in terms of both quality and performance.

All results in this paper were produced on a PCwith an Intel i7-4770K 3.50 GHz CPU, 32 GB RAMand NVIDIA GeForce GTX TITAN GPU. A frame-buffer resolution of 1280 × 720 was used in all ourexperiments and the accompanying video.

6.1 Performance and Quality TradeoffsNumber of layers. Currently, we extract layers in adepth-peeling fashion [18], which requires a geometrypass for every single layer. Therefore, the choice ofthe number of layers is a trade-off between renderingperformance and quality. Table 2 shows that, evenusing more than ten layers, DMRT achieves real-timeframe rates. For the measurements in this table, weused a proxy point cloud that is sub-sampled from theoriginal point cloud by a factor of 686 (as in Fig. 17).The table also shows a breakdown of the running timeof the algorithm by its stages.

Size of proxy point cloud. In our approach, anotherkey criterion for the rendering performance and qual-ity as well as the labeling time is the size of the proxypoint cloud. Fig. 7 shows renderings for different pa-rameters. As expected, the number of layers requiredfor a high-quality rendering decreases with growingsizes of proxy point clouds. For the same layer counton the other hand, a DMRT reconstruction with moreproxy points results in an increase of the labeling timeand a decrease of the rendering performance.

For more performance results, see Section 6.5.

6.2 GPU Memory ConsumptionThe GPU memory usage of our method is affected byseveral factors, including the number of input images,the size of the proxy point cloud, the layer count, andthe framebuffer resolution.

For all our test scenes, we generated depth maps ofsize 1024 × 684. Each map consumes 2.8MB of GPU




|PLD| 4.4M 1M 0.2Mlabeling in 16.4 min 3.4 min 0.8 min# layers 3 3 4 4 7fps 44 54 48 49 36

Fig. 7. Results for different parameters. The red ellipses indicate regions of some artifacts.

TABLE 2Average performance of DMRT rendering (in ms) for

different numbers of layers, measured during awalkthrough of Scene 2.

# layers 1 3 5 7 9 11

visibility stage 2 6 9.8 13.7 17.5 21.2

raytracing pass 1.2 2 2.3 2.7 3 3.3

labeling pass 2 2 2 2 2 2

img. man. 2.5 2.8 2.9 3.1 3.3 3.3

texturing pass 6.3 6.4 6.3 6.3 6.3 6.3

total 14 19.2 23.3 27.8 32.1 36.1

fps 71 52 43 36 31 28

memory (one float per pixel). As described in Sec-tion 5.2.2, the high-resolution input images are cachedin a GPU texture array on demand. We reserved 1GBof GPU memory for them. We resort to low-resolutionimages (of size 256 × 171) if input images are notavailable in the texture array. All of these are stored onthe GPU, and each requires 0.13MB. As an example,rendering Scene 2 requires 563MB for the 192 depthmaps and low-resolution images.

Furthermore, each point of the proxy point cloudis represented by six floats for the position and thenormal vector, and an integer for the label. A screen-space pixel in a layer requires two floats, one forthe depth, and the other for the label. Therefore, anoptimal DMRT rendering of Scene 2 with |PLD| = 1Mat a resolution of 1280 × 720 and five layers (see theaccompanying video) occupies an additional 65MB ofGPU memory (37MB for the proxy points and 28MBfor the layers).

6.3 Ground-Truth Comparison

In order to analyze the reconstruction error of DMRT,we rendered the scene from the viewpoint of one ofthe image cameras, and compared the color outputand depth buffer at different stages of our renderingpipeline to the original image and its correspond-ing high-resolution depth map, respectively (Fig. 8).This comparison can give a first impression of the

(a) image & high-res. surface

0

0.1

0.05

0

1

0.5

(b) coarse surface

(c) raytraced surface

(d) raytraced surface with per-pixel labels

voronoi decomposition

Fig. 8. Analysis of the reconstruction error. The sceneis rendered as seen by the image shown in (a). Theerror is measured as the deviation of the color outputand depth buffer at different rendering stages (b)-(d) from the reference image and its correspondinghigh-resolution depth map, respectively. The color anddepth differences are visualized as heat maps shownin the right column.




addi

tiona

l lay

ers

first

laye

r

(a) initial (b) 1 iteration (c) 5 iterations (d) 30 iterations (e) 100 iterations

(f) initial (g) 1 iteration (h) 5 iterations (i) 100 iterations (j) our output

Fig. 9. (a)-(e) show absolute stepsizes of the iterative search in the first layer. For each of the white pixels in (e),where raytracing of the first layer fails, a maximum of three more layers are consecutively searched until a validintersection is found. (f)-(i) show stepsizes in the layer where raytracing succeeds. Note that for the visualization,all stepsizes are multiplied by 10 and then clamped to the range [0, 1].

reconstruction error, however note that due to differ-ent lighting conditions when acquiring the differentimages, a full match to the “ground truth” is notpossible. The rightmost column in Fig. 8 shows colorand depth differences as heat maps. For this analysis,we used a sub-sampling factor of 686 (|PLD| = 1M)to generate the proxy point cloud, and rendered thescene using four layers. Figs. 8c and 8d show theDMRT reconstruction after a maximum of 100 iter-ative search and 20 binary search iterations.

A comparison of the heat maps (of depth differ-ences) in Figs. 8b and 8c shows that raytracing reducesthe overall depth error. As expected, remaining differ-ences are maximal at oblique angles and silhouettes.However, note that the differences at silhouettes arenot generated by our raytracing method. Instead,these occur naturally since the depth map of theimage and the raytraced depth maps have differentsampling rates of the observed surface, and thusexhibit slight geometric variations at silhouettes.

Interestingly, the overall color error is minimal,except inside the two small rooms. This is becausethe labeling assigns the points there to images thathave better geometric resolution, but were acquiredunder different lighting conditions than the referenceimage in Fig. 8a.

Fig. 8c shows that while raytracing resolves thegeometry at silhouettes adequately, it generates falselabels among these regions by mapping the labelsof proxy splats to the background (see also Fig. 6).As we have shown in the accompanying video, thesefalse labels generate ghosting artifacts during anima-tions, and are resolved by our per-pixel labeling step(Fig. 8d).

(a) initial (b) 4 layers (c) 11 layers

Fig. 10. Worst-case scenario. (a) shows a poor initial-ization of the stepsizes of the iterative search, thereforerequiring many layers for a high-quality visualization.Our output with four (b) and eleven (c) layers.

6.4 Convergence

In this section, we analyze the convergence of theiterative search with adaptive stepsize, which is re-sponsible for finding a “tight” pair of points enclosingan intersection point to seed the binary search. Wealso discuss the limits of our rendering method for asynthetically generated scene configuration.

We rendered the scene using the same parametersas in Section 6.3. Figs. 9a-9e show absolute stepsizesof the iterative search in the first label layer. For somepixels, our raytracing failed to find intersections inthis layer. These pixels are marked white in Fig. 9e,and for each of them, an intersection point is searchedin three additional layers. Figs. 9f-9i show absolutestepsizes in the layer where an intersection point isfound.

We perform a total of ctotal =∑k

i=1 ci iterativesearch iterations for each pixel, where 1 ≤ ci ≤ cmax

is the number of iterations performed in the ith layer.The maximum iteration count in each layer is boundby cmax (100 in this example), and k refers to the indexof the layer where the intersection is found (or the




TABLE 3Comparison of the labeling times and rendering

performance on Scene 2.

DMT

depth-map res. 256× 171 512× 342 1024× 684

aver. num. of labelsper depth map 46

labeling times 7 min 23 min 87 minmin/avg/max fps 32/45/84 11/17/46 3/5/16

DMRT

depth-map res. 1024× 684

|PLD| 1M# labels 192

# layers 5

labeling times 3.4 minmin/avg/max fps 30/43/74

user-defined maximum layer count).In practice, iterative search converges in a few

iterations to an intersection point, if any. Otherwise, itterminates early if an intersection with a discontinuitytriangle is found. In our experiment, the iterationcount ctotal was on average 4.6 over all pixels, and ittook the raytracing pass 2.4 ms to complete (includingthe binary search procedure).

The convergence of the iterative search is only guar-anteed if each texel along the projection of the viewray onto the depth map is visited, which is slow ifthe depth-map resolution is high. The iterative searchwith adaptive stepsize, on the other hand, provedvery efficient in practice to find in a few iterationsa pair of points enclosing an intersection point.

In order to see the performance of our raytracingfor a poor initialization of the stepsizes (Fig. 10a),we multiplied the splat radii by 2.5, and renderedthe scene again. In this scenario, the iterative searchrequired on average ctotal = 9.5 iterations per pixel,and the raytracing pass completed in 4.2 ms. Eventhough our raytracing was still efficient, four lay-ers was not sufficient to obtain a high-quality result(Fig. 10b). To obtain a comparable result (Fig. 10c)as in Fig. 9j, eleven layers were required, and theraytracing pass performed in 8 ms with ctotal = 11on average. We see that the most performance-criticalpart of our rendering pipeline is still the extraction ofthe layers, while searching for intersections in theselayers is quite efficient (see also Table 2).

6.5 Comparison to DMTFinally, we compare our method to the related depth-map triangulation approach on Scene 2. For thiscomparison, we used a proxy point cloud of size1M and five layers for the DMRT approach. Ourexperiments suggest that this configuration is morethan sufficient for a not completely artifact-free, buthigh-quality DMRT rendering. On the other hand,depending on the chosen stitching threshold, DMTcan produce severe artifacts (Fig. 11).

Table 3 compares the labeling times and renderingperformance of DMT and DMRT for differently sized

�𝜀p

q

�𝜀p

q

𝜀�5c

m𝜀�

10cm

Fig. 11. DMT’s stitching artifacts. Top: Due to a smallstitching threshold ε, the points p and q are consideredas non-overlapping by the DMT, leading to the pointp on the low-resolution depth map to be chosen fortexturing. Bottom: In DMT, visibility is not resolvedfor features smaller than the ε threshold. Thus, theinvisible point q can shine through the front surface.

walkthrough time in s

rend

erin

g tim

e in

ms

0

50

100

150

200

250

3001 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 101

105

109

113

117

121

DMT 256 x 171DMT 512 x 342DMT 1024 x 684DMRT 1024 x 684

Fig. 12. Comparison of the rendering performanceof DMT and our DMRT approach for a walkthroughof Scene 2. Using high-resolution depth maps, ourmethod runs at 43 fps, being on average about an orderof magnitude faster than the previous work, which hasto settle with a quarter of the resolution to reach thisperformance.

depth maps. Since the resolution of the depth mapsdoes not have a direct effect on the performanceof DMRT, we used the highest resolution for ourapproach. The table shows that DMT strongly cou-ples the labeling time and rendering performance tothe resolution of the depth maps used to representthe scene. If we aim for an equal-quality comparison(Figs. 1c right and 1d), DMT needs to label 192 depthmaps of size 1024×684, which takes about 26 timeslonger (87 min) than labeling the 1M proxy pointsused by DMRT (3.4 min). While DMT cannot renderdepth maps of this size in real time anymore (5 fps onaverage), our new raytracing method is about 9 times




(a) one layer (b) two layers (c) three layers

Fig. 13. (a)-(c) show results of raytracing different num-ber of layers along each view ray. Raytracing a singlelayer (a) produces severe artifacts (black backgroundpixels), especially near silhouettes. In this example,artifacts produced by raytracing two layers (b) arebarely visible. Adding a third layer almost completelyremoves artifacts.

Fig. 14. The left image shows the coarse surface(without raytracing). By using textured coarse positionswhere raytracing fails, the splats along silhouettes spu-riously occlude the background (right top). Therefore,we always discard pixels if no valid intersection withthe surface could be found (right bottom).

faster, thus providing a real-time high-quality visual-ization of the scene (Figs. 1d and 12, see also Table 2).Reducing the depth-map resolution to 256×171 allowsDMT to almost match these performance values for la-beling and real-time rendering, but noticeably reducesthe geometric resolution of the output (Fig. 1c left).

7 LIMITATIONS AND FUTURE WORK

Number of layers We found that extracting a fewlayers in the visibility stage (Section 5.1.1) is sufficientfor high-quality visualizations (Fig. 13). However, inscenes of higher geometric complexity, more layersmight be required (e.g., see Fig. 10). At the moment,we use a naive implementation that performs k geom-etry passes for k layers, which can become inefficientas k increases. In such cases, more elaborate A-Buffertechniques could be incorporated to achieve a multi-layer setup in a single pass [19]. Also, for a few pixelswhere raytracing fails to find a valid intersection withany of the layers, we show the background colorinstead of textured coarse surface points (Fig. 14). We

S

Ii

Fig. 15. Sub-sampling issue. The shown view rayintersects a discontinuity edge of Ii. Due to the poorsampling of the surface S by proxy points, there isn’tany second layer to search for a valid intersection inthis case.

time

coar

sera

ytra

ced

Fig. 16. Label changes under camera motion can leadto view-dependent geometry of silhouettes.

opted for this solution since splats along silhouettescan also occlude the background.

Size of proxy point cloud. As discussed in Sec-tion 6.1, the sub-sampling factor is a trade-off betweenperformance and quality. In order to achieve highperformance, this factor has to be large enough, butshould be small enough to maintain fine surface de-tails. Currently, we discard the highest levels of detailof the input point cloud to obtain proxy points. How-ever, a feature-aware sub-sampling strategy couldproduce an even better rendering quality, since thegeneration of the proxy points currently does nottake local surface characteristics into account. Fig. 15illustrates the absence of layers for raytracing, evenfor a reasonable coverage of the surface by proxysplats.

Motion artifacts. Depth maps can have slightlyvarying representations of silhouettes based on theviewing angle and distance relative to the observedsurface. Thus, label changes under camera motioncan lead to raytracing of depth maps with possiblydifferent representations of silhouettes (Fig. 16).

Inherited artifacts. Other rendering artifacts thatare inherited from the previous approach [8] are theflickering during animations, and false textures atsome silhouettes due to image misregistrations andthe noise inherent in point clouds.

Extension. Note that the runtime steps required to



IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. X, NO. X, MONTH 20XX 11Sc

ene

1 (8

7x)

Scen

e 2

(686

x)Sc

ene

3 (2

075x

)

Labeled proxy splats Raytracing pass Texturing passLabeling pass

Fig. 17. Results from three data sets. From left to right: Splatted proxy points with (from top to bottom) increasingsub-sampling factors of the original point cloud ranging from 87× up to 2075×; raytraced surface without and withper-pixel labels; and textured surface. The insets demonstrate how the labels of the column are mapped to theback wall if the labeling pass is not applied.

create a high-resolution depth buffer (splatting a smallnumber of proxy points (visibility stage) and perform-ing an efficient raytracing in a full-screen pass) are sofast that they could be run twice per frame. This couldbe used, for example, to create a shadow map for amoving light source, allowing dynamic shadows atinteractive frame rates.

8 CONCLUSION

In this paper, we introduced a novel multi-depth-mapraytracing approach for high-quality reconstructionand visualization of large-scale scenes. In a prepro-cessing, we generate multiple high-resolution depthmaps and perform a graph-cut based optimizationof the point-to-image assignments (point labels) on astrongly reduced subset of the original point cloud. Atruntime, we first reconstruct a high-resolution depthbuffer by raytracing these depth maps, where thelabels indicate which depth maps to intersect. In asecond step, we compute high-quality per-pixel labels

from the sparse label information and use these fortexturing the depth buffer.

We have shown that our method allows for a real-time visualization of large-scale scenes at much highergeometric resolution than the related state of theart, which is based on rendering and stitching ofmany depth maps. Our results also indicate a hugeperformance gain in the labeling step as compared tothe previous method.

REFERENCES

[1] D. T. Guinnip, S. Lai, and R. Yang, “View-dependenttextured splatting for rendering live scenes,” in ACMSIGGRAPH 2004 Posters, ser. SIGGRAPH ’04. NewYork, NY, USA: ACM, 2004, pp. 51–. [Online]. Available:http://doi.acm.org/10.1145/1186415.1186474

[2] D. Sibbing, T. Sattler, B. Leibe, and L. Kobbelt, “Sift-realisticrendering,” in Proc. the 2013 International Conf. 3D Vision (3DV13), 2013, pp. 56–63.

[3] M. Kazhdan, M. Bolitho, and H. Hoppe, “Poisson surfacereconstruction,” in Proc. the 4th Eurographics Symp. GeometryProcessing (SGP 06), 2006, pp. 61–70.




[4] M. Kazhdan and H. Hoppe, “Screened poisson surface recon-struction,” ACM Trans. Graph., vol. 32, no. 3, pp. 29:1–29:13,June 2013.

[5] V. Lempitsky and D. Ivanov, “Seamless mosaicing of image-based texture maps,” in Computer Vision and Pattern Recognition(CVPR 07), IEEE, June 2007, pp. 1–6.

[6] R. Gal, Y. Wexler, E. Ofek, H. Hoppe, and D. Cohen-Or,“Seamless montage for texturing models,” Computer GraphicsForum, vol. 29, no. 2, pp. 479–486, 2010.

[7] M. Bolitho, M. Kazhdan, R. Burns, and H. Hoppe, “Multilevelstreaming for out-of-core surface reconstruction,” in Proc. the5th Eurographics Symp. Geometry Processing (SGP 07), 2007, pp.69–78.

[8] M. Arikan, R. Preiner, C. Scheiblauer, S. Jeschke, and M. Wim-mer, “Large-scale point-cloud visualization through localizedtextured surface reconstruction,” IEEE Transactions on Visual-ization and Computer Graphics, vol. 99, no. PrePrints, p. 1, 2014.

[9] M. Botsch and L. Kobbelt, “High-quality point-based render-ing on modern gpus,” in Proc. the 11th Pacific Conf. ComputerGraphics and Applications (PG 03), 2003, pp. 335–343.

[10] M. Botsch, M. Spernat, and L. Kobbelt, “Phong splatting,” inProceedings of the First Eurographics Conference on Point-BasedGraphics, ser. SPBG’04. Aire-la-Ville, Switzerland, Switzer-land: Eurographics Association, 2004, pp. 25–32. [Online].Available: http://dx.doi.org/10.2312/SPBG/SPBG04/025-032

[11] M. Zwicker, J. Rasanen, M. Botsch, C. Dachsbacher,and M. Pauly, “Perspective accurate splatting,” inProceedings of Graphics Interface 2004, ser. GI ’04.School of Computer Science, University of Waterloo,Waterloo, Ontario, Canada: Canadian Human-ComputerCommunications Society, 2004, pp. 247–254. [Online].Available: http://dl.acm.org/citation.cfm?id=1006058.1006088

[12] M. Botsch, A. Hornung, M. Zwicker, and L. Kobbelt, “High-quality surface splatting on today’s gpus,” in Proc. the 2ndEurographics / IEEE VGTC Symp. Point-Based Graphics (SPBG05), 2005, pp. 17–24.

[13] M. Agrawala, R. Ramamoorthi, A. Heirich, and L. Moll,“Efficient image-based methods for rendering soft shadows,”in Proceedings of the 27th Annual Conference on ComputerGraphics and Interactive Techniques, ser. SIGGRAPH ’00.New York, NY, USA: ACM Press/Addison-WesleyPublishing Co., 2000, pp. 375–384. [Online]. Available:http://dx.doi.org/10.1145/344779.344954

[14] F. Xie, E. Tabellion, and A. Pearce, “Soft shadows by raytracing multilayer transparent shadow maps,” in Proceedingsof the 18th Eurographics Conference on Rendering Techniques,ser. EGSR’07. Aire-la-Ville, Switzerland, Switzerland:Eurographics Association, 2007, pp. 265–276. [Online].Available: http://dx.doi.org/10.2312/EGWR/EGSR07/265-276

[15] C. Zhang, H.-H. Hsieh, and H.-W. Shen, “Real-time reflectionson curved objects using layered depth texture,” in IADISInternational Conference Proceedings on Computer Graphics andVisualization, 2008.

[16] A. Reshetov, A. Soupikov, and J. Hurley, “Multi-level ray tracing algorithm,” in ACM SIGGRAPH2005 Papers, ser. SIGGRAPH ’05. New York, NY,USA: ACM, 2005, pp. 1176–1185. [Online]. Available:http://doi.acm.org/10.1145/1186822.1073329

[17] C. Scheiblauer and M. Wimmer, “Out-of-core selection andediting of huge point clouds,” Computers and Graphics, vol. 35,no. 2, pp. 342–351, Apr. 2011.

[18] C. Everitt, “Interactive order-independent transparency,”NVIDIA, Tech. Rep., 2001.

[19] H. Gruen and N. Thibieroz, “Oit and indirect illuminationusing dx11 linked lists,” in GDC, 2010.

PLACEPHOTOHERE

Murat Arikan is a Ph.D. student at the Insti-tute of Computer Graphics and Algorithms ofthe Vienna University of Technology. He re-ceived his M.Sc. degree in Mathematics fromVienna University of Technology in 2008. Hiscurrent research interests are real-time ren-dering, point-based rendering, and interac-tive modeling.

PLACEPHOTOHERE

Reinhold Preiner received his B.Sc. degreein Computer Science from Graz Universityin 2008 and his M.Sc. degree in ComputerScience from Vienna University of Technol-ogy in 2010. His research interests includereconstruction, geometry processing, and in-teractive global illumination. He is now anassistant professor and doctoral researcherat the Institute of Computer Graphics andAlgorithms of the Vienna University of Tech-nology.

PLACEPHOTOHERE

Michael Wimmer is an associate professorat the Institute of Computer Graphics andAlgorithms of the Vienna University of Tech-nology, where he received an M.Sc. in 1997and a Ph.D. in 2001. His current researchinterests are real-time rendering, computergames, real-time visualization of urban en-vironments, point-based rendering and pro-cedural modeling. He has coauthored manypapers in these fields, and was papers co-chair of EGSR 2008 and Pacific Graphics

2012, and is associate editor of Computers & Graphics.



IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS ... · PDF fileIEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. X, NO. X, ... triangle assignment ... Solution

Documents