Bounding Volume Hierarchy (BVH) trees are ubiquitous in modern ray tracing applications and help accelerate the search for ray-triangle intersections in a scene by organizing geometry into progressively tighter axis-aligned bounding boxes.
This document is based on the survey on bounding volume hierarchies for ray tracing 1.
The Ultimate Guide to Bounding Volume HierarchiesConstructionThe Surface Area Heuristic Other HeuristicsRDH - Ray Distribution HeuristicSolid Angle ProjectionOSAH - Occlusion Heuristic EPO - End Point OverlapLCV - Leaf Count VariabilityCWBVH - Compressed Wide BVHAlgorithmsTop-Down ConstructionSplitTerminateBottom-Up ConstructionSome Commonly Used AlgorithmsBBVH - Binned BVHSBVH - Spatial BVHLBVH - Linear BVHHLBVH - Hierarchical Linear BVHAAC - Approximate Agglomerative ClusteringTRBVH - Treelet Restructuring BVHPLOC - Parallel Locally-Ordered ClusteringPRBVH - Parallel Reinsertion BVHPerformanceOther BVHsMulti-Level HierarchiesWide BVHBVH LayoutTree LayoutCompressionWhy BVH?
The optimal acceleration structure for ray tracing is believed to be an NP-hard problem2, so construction relies on heuristics and greedy algorithms. Since the goal of using an acceleration structure is to accelerate ray traversal, the ultimate goal is to reduce rendering time. Heuristics attempt to calculate the relative rendering time using different BVH structures.
The SAH assumes a few properties:
The probability of an intersection is approximated by the surface area of the node. The larger the surface area, the more likely a random infinite ray will intersect the node. Each intersection is associated with a cost and the cost is minimized by bounding the geometry using the minimal surface area.
The expected cost of traversing a BVH tree is the sum of the costs of each interaction (i.e. ray-box, ray-triangle), weighted by their surface area relative to the total size of the root node.
Although SAH is the most common heuristic because it applies to the most general case, there are also other heuristics to optimize for specific cases.
The assumptions made with SAH that rays are infinitely long is not true and the cost can change drastically with the average length of rays. RDH considers real ray distribution by sampling a small set of rays and tracking the number of hits on each node.
Instead of using the proportional surface area, use the proportional number of hits at each node.
The first SAH asssumption that requires rays to originate outside the root node bounding box is very unrealistic for secondary rays. Secondary rays originate from an intersection point in the scene and thus always originate inside the root node bounding volume.
The probability of a ray hitting node
It may be useful to consider occlusion and proximity of geometry to ray in cost model when building BVH. Geometry marked as occluded is preferred to be built into a different subtree. While they can still be intersected, the traversal cost is higher for previously occluded geometry in this heuristic.
Occlusion information can be gathered from previous frames in an animation or by sampling a representative set of rays.
Overlapping nodes lower performance because a ray must traverse multiple subtrees to find the closest hit.
In this metric,
In SIMD execution, warps remain together in each part of the while loop so variances in leaf nodes delay the inner node loop.
This measures the standard deviation of the number of leaf nodes
Special cost functions are used to convert a binary BVH to wide BVH trees. One case of a wide BVH tree is the compressed wide BVH tree that has up to eight children per node.
When appropriate, nodes are either converted to one of eight child nodes or an inner node with a distributed forest with
In general, BVH build algorithms can be categorized either as bottom-up or top-down, each with their own advantages. Starting from the top helps focus optimizations at the top levels of the tree, which may be more critical to traversal performance. However, starting from the bottom optimizes at a finer granularity and usually generates BVH trees with a lower global cost.
xxxxxxxxxx
push root onto stack
while stack is not empty:
pop node from stack
if terminate then
make node leaf
else
split primitives into children
assign children as child nodes
foreach child in node:
push child onto stack
The main aspects of top-down construction is defined by the terminate
function and the split
function.
The three most common approaches to splitting a given node are as follows:
Using a cost model is slower but generates better quality BVH trees than using spatial or object splits. However, the SAH cannot be calculated until the entire tree is complete. From a top-down approach, costs must be approximated.
The approximation of the SAH treats unbuilt subtrees as leaf nodes and calculates the cost as
To avoid calculating the cost of every possible split, which can be especially difficult at the top of the tree, primitives are often binned into regions so that only a subset of all possible splitting planes are evaluated.
In the simplest method, the top-down approach terminates when the approximated cost is higher than the cost of a leaf node.
Bottom-up builds generally start by clustering primitives. In each iteration, candidate clusters are merged together until all primitives have been merged into a single structure.
xxxxxxxxxx
while clusters > 1:
identify nearest clusters
assign clusters as child nodes
merge into parent cluster
The lowest cost is created by merging the closest pairs in each iteration. However, determining the closest cluster pair may be too time consuming and approximations can be made using morton curves.
Here is an overview of the most commonly used BVH building algorithms. BVH trees can also be further optimized with insertion-based methods or refitted for dynamic scenes.
👍 : Good quality, improves on Kd trees
❌ : Slow
👍 : Very high quality
❌ : Slow build, more primitive references
👍 : Very fast, parallel sort can be parallelized
❌ : Does not optimize for SAH
Hybrid: A hybrid version of LBVH + SAH can be created by using LBVH for initial splits, then SAH when there is enough splits to efficiently parallelize.
👍 : Very fast, avoids singleton nodes
❌ : Lower quality than SAH, may perform worse than LBVH in some scenes
Hybrid: A hybrid version of SAH + HLBVH can be created by using coarse significant bits of Morton code as bins for SAH, then using HLBVH for the remaining bits.
👍 : Bottom-up approach, high quality, can be parallelized
❌ : Requires large stack state
👍 : Generates high quality trees, parallel algorithm
❌ : Requires an initial BVH, localized optimizations
👍 : Targets GPUs, high quality trees, fully bottom-up approach
❌ : Less effective on smaller scenes
👍 : Optimizes global cost, parallel algorithm
❌ : Requires an initial BVH
Some works evaluate these BVH build algorithms for their actual ray tracing performance versus their SAH cost. Hippie includes most implementations.
To date, the best ground-up BVH builder is SBVH. However, additional modifications such as PRBVH can improve SBVH further.
Structures beyond the basic binary BVH can be efficient for specific situations such as dynamic geometry or SIMD traversal.
To support dynamic scene geometry and accelerate BVH builds for large scenes, modern BVH trees are commonly organized into a two level hierarchy, featuring a top-level acceleration structure (TLAS) and many bottom-level acceleration structures (BLAS).
BLAS structures are regular BVH trees holding the geometry primitives. However, one BLAS object may be instanced many times in the scene. The TLAS structure builds a BVH tree over instances of BLAS objects. The leaf nodes of a TLAS include a transformtion from world-space into the instance-space and a pointer to the appropriate BLAS.
Some APIs offer more than two levels of division, or the option of a programmable instance with traversal shaders17. Other works re-structure multi-level hierarchies through a process called re-braiding18 by building the TLAS over BLAS subtrees in order to improve SAH costs.
SIMD ray traversals can benefit from wide BVH trees by performing ray intersection tests in parallel at each step. However, there is still poor SIMD efficiency when the entire group must traverse a subtree only few threads hit. There are also more empty slots in wide BVH trees that limit efficiency.
Wide BVH trees can either be collapsed from a binary BVH tree or built on their own.
Beyond the construction of BVH structures and the associated performance metrics, another key component is the memory layout. The way BVH nodes are stored in memory significantly affects the traversal performance. There are optimizations to promote cache hit rates, to take advantage of locality, and to reduce memory footprint.
A BVH node needs to store at least the bounding volume (represented by a minimum and maximum extent in 3D space) and the location of its children (can be implicit, indexed, addressed, etc.).
BVH trees are usually mapped to memory in relation to the traversal patterns. The most basic forms are stored in memory in Depth-First Order and Breadth-First Order. Alternatively, proposals including Cache-Oblivious BVH (COLBVH)19 and Swapped Subtrees (SWST)20 organize the tree into subtree clusters in memory. However, the tree layout does not significantly affect performance and speedups vary by application and scene.
Data size of BVH trees are greatly affected by bounding boxes and child pointers. The bounding boxes can easily be expressed relative to its parent for a reduced storage size. These values are also often quantized in a conservative fashion. In addition, child pointers are often stored as offsets. Several works explore different ways to compress this data:
Although BVH trees are the default standard in ray tracing applications today, ray tracing did not always rely on BVHs. Earlier alternative spatial structures to accelerate ray traversal include the following:
Acceleration Structure | Details |
---|---|
Uniform Grid | - Bounding boxes are subdivided in 3 dimensions (uniform or non-uniform) - Number of voxels is derived from number of objects in scene |
Binary Space Partitioning (BSP) | - Binary space partitioning - Each node is split equality into two child boxes of the same size |
Kd-Tree | - Splitting planes are placed based on cost function - Each node is split into two child boxes by splitting plane |
Octree | - Each box is subdivided into eight child boxes of the same size |
Grid Hierarchy | - Each cell of the uniform grid is subdivided with other uniform grid |
Some of the reasons why BVH prevailed are as follows:
A full comparison of BVH versus Kd-tree is evaluated in Performance Comparison of Bounding Volume Hierarchies and Kd-Trees for GPU Ray Tracing23