Proj 3 CUDA Path Tracer - Instructions

This is due Tuesday October 3rd at 11:59pm.

This project involves a significant bit of running time to generate high-quality images, so be sure to take that into account. You will receive an additional 2 days (due Thursday, October 5th) for "README and Scene" only updates. However, the standard project requirements for READMEs still apply for the October 3th deadline. You may use these two extra days to improve your images, charts, performance analysis, etc.

If you plan to use late days on this project (which we recommend), they will apply to the October 5th deadline. Once you have used your extra days and submitted the project, you will recieve the additional 2 days for "README and Scene" updates only.

Link to "Pathtracing Primer" Slides

Summary:

In this project, you'll implement a pathtracing denoiser that uses geometry buffers (G-buffers) to guide a smoothing filter.

We would like you to base your technique on the paper "Edge-Avoiding A-Trous Wavelet Transform for fast Global Illumination Filtering," by Dammertz, Sewtz, Hanika, and Lensch. You can find the paper here: https://jo.dreggn.org/home/2010_atrous.pdf

Denoisers can help produce a smoother appearance in a pathtraced image with fewer samples-per-pixel/iterations, although the actual improvement often varies from scene-to-scene. Smoothing an image can be accomplished by blurring pixels - a simple pixel-by-pixel blur filter may sample the color from a pixel's neighbors in the image, weight them by distance, and write the result back into the pixel.

However, just running a simple blur filter on an image often reduces the amount of detail, smoothing sharp edges. This can get worse as the blur filter gets larger, or with more blurring passes. Fortunately in a 3D scene, we can use per-pixel metrics to help the filter detect and preserve edges.

raw pathtraced image	simple blur	blur guided by G-buffers

These per-pixel metrics can include scene geometry information (hence G-buffer), such as per-pixel normals and per-pixel positions, as well as surface color or albedo for preserving detail in mapped or procedural textures. For the purposes of this assignment we will only require per-pixel metrics from the "first bounce."

per-pixel normals	per-pixel positions (scaled down)	???! (dummy data, time-of-flight)

Building on Project 3 CUDA Path Tracer

We highly recommend that you integrate denoising into your Project 3 CUDA Path Tracers.

This project's base code is forked from the CUDA Path Tracer basecode in Project 3, and exists so that the assignment can stand on its own as well as provide some guidance on how to implement some useful tools. The main changes are that we have added some GUI controls, a very simple pathtracer without stream compaction, and G-buffer with some dummy data in it.

You may choose to use the base code in this project as a reference, playground, or as the base code for your denoiser. Using it as a reference or playground will allow you to understand the changes that you need for integrating the denoiser. Like Project 3, you may also change any part of the base code as you please. This is YOUR project.

Recommendations:

Every image you save should automatically get a different filename. Don't delete all of them! For the benefit of your README, keep a bunch of them around so you can pick a few to document your progress at the end. Outtakes are highly appreciated!
Remember to save your debug images - these will make for a great README.
Also remember to save and share your bloopers. Every image has a story to tell and we want to hear about it.

src/ C++/CUDA source files.
scenes/ Example scene description files.
img/ Renders of example scene description files. (These probably won't match precisely with yours.)
- note that we have added a cornell_ceiling_light scene
- simple pathtracers often benefit from scenes with very large lights
external/ Includes and static libraries for 3rd party libraries.
imgui/ Library code from https://github.com/ocornut/imgui

Running the code

The main function requires a scene description file. Call the program with one as an argument: cis565_denoiser scenes/cornell_ceiling_light.txt. (In Visual Studio, ../scenes/cornell_ceiling_light.txt.)

If you are using Visual Studio, you can set this in the Debugging > Command Arguments section in the Project Properties. Make sure you get the path right - read the console for errors.

Controls

Esc to save an image and exit.
S to save an image. Watch the console for the output filename.
Space to re-center the camera at the original scene lookAt point.
Left mouse button to rotate the camera.
Right mouse button on the vertical axis to zoom in/out.
Middle mouse button to move the LOOKAT point in the scene's X/Z plane.

Requirements

In this project, you are given code for:

Loading and reading the scene description format.
Sphere and box intersection functions.
Support for saving images.
Working CUDA-GL interop for previewing your render while it's running.
A skeleton renderer with:
- Naive ray-scene intersection.
- A "fake" shading kernel that colors rays based on the material and intersection properties but does NOT compute a new ray based on the BSDF.

Ask in Ed Discussion for clarifications.

Part 1 - Core Features

You will need to implement the following features:

A shading kernel with BSDF evaluation for:
- Ideal Diffuse surfaces (using provided cosine-weighted scatter function, see below.) [PBRT 8.3].
- Perfectly specular-reflective (mirrored) surfaces (e.g. using glm::reflect).
- See notes on diffuse/specular in scatterRay and on imperfect specular below.
Path continuation/termination using Stream Compaction from Project 2.
After you have a basic pathtracer up and running, implement a means of making rays/pathSegments/intersections contiguous in memory by material type. This should be easily toggleable.
- Consider the problems with coloring every path segment in a buffer and performing BSDF evaluation using one big shading kernel: different materials/BSDF evaluations within the kernel will take different amounts of time to complete.
- Sort the rays/path segments so that rays/paths interacting with the same material are contiguous in memory before shading. How does this impact performance? Why?
A toggleable option to cache the first bounce intersections for re-use across all subsequent iterations. Provide performance benefit analysis across different max ray depths.

Part 2 - Make Your Pathtracer Unique!

The following features are a non-exhaustive list of features you can choose from based on your own interests and motivation. Each feature has an associated score (represented in emoji numbers, eg. 5️⃣).

You are required to implement additional features of your choosing from the list below totalling up to minimum 10 score points.

An example set of optional features is:

Mesh Loading - 4️⃣ points
Refraction - 2️⃣ points
Anti-aliasing - 2️⃣ points
Final rays post processing - 3️⃣ points

This list is not comprehensive. If you have a particular idea you would like to implement (e.g. acceleration structures, etc.), please post on Ed.

Extra credit: implement more features on top of the above required ones, with point value up to +25/100 at the grader's discretion (based on difficulty and coolness), generally .

Visual Improvements

2️⃣ Refraction (e.g. glass/water) [PBRT 8.2] with Frensel effects using Schlick's approximation or more accurate methods [PBRT 8.5]. You can use glm::refract for Snell's law.
- Recommended but not required: non-perfect specular surfaces. (See below.)
2️⃣ Physically-based depth-of-field (by jittering rays within an aperture). [PBRT 6.2.3]
2️⃣ Stochastic Sampled Antialiasing. See Paul Bourke's notes. Keep in mind how this influences the first-bounce cache in part 1.
4️⃣ Procedural Shapes & Textures.
- You must generate a minimum of two different complex shapes procedurally. (Not primitives)
- You must be able to shade object with a minimum of two different textures
5️⃣ (6️⃣ if combined with Arbitrary Mesh Loading) Texture mapping [PBRT 10.4] and Bump mapping [PBRT 9.3].
- Implement file-loaded textures AND a basic procedural texture
- Provide a performance comparison between the two
2️⃣ Direct lighting (by taking a final ray directly to a random point on an emissive object acting as a light source). Or more advanced [PBRT 15.1.1].
4️⃣ Subsurface scattering [PBRT 5.6.2, 11.6].
3️⃣ Better random number sequences for Monte Carlo ray tracing
3️⃣ Some method of defining object motion, and motion blur by averaging samples at different times in the animation.
3️⃣ Use final rays to apply post-processing shaders. Please post your ideas on Piazza before starting.

Mesh Improvements

Arbitrary mesh loading and rendering (e.g. glTF 2.0 (preferred) or obj files) with toggleable bounding volume intersection culling
- 4️⃣ glTF
- 2️⃣ OBJ
- For other formats, please check on the class forum
- You can find models online or export them from your favorite 3D modeling application. With approval, you may use a third-party loading code to bring the data into C++.
  - tinygltf is highly recommended for glTF.
  - tinyObj is highly recommended for OBJ.
  - obj2gltf can be used to convert OBJ to glTF files. You can find similar projects for FBX and other formats.
- You can use the triangle intersection function glm::intersectRayTriangle.
- Bounding volume intersection culling: reduce the number of rays that have to be checked against the entire mesh by first checking rays against a volume that completely bounds the mesh. For full credit, provide performance analysis with and without this optimization.
Note: This goes great with the Hierarcical Spatial Data Structures.

Performance Improvements

2️⃣ Work-efficient stream compaction using shared memory across multiple blocks. (See GPU Gems 3, Chapter 39.)
- Note that you will NOT receieve extra credit for this if you implemented shared memory stream compaction as extra credit for Project 2.
6️⃣ Hierarchical spatial data structures - for better ray/scene intersection testing
- BVH or Octree recommended - this feature is more about traversal on the GPU than perfect tree structure
- CPU-side data structure construction is sufficient - GPU-side construction was a final project.
- Make sure this is toggleable for performance comparisons
- If implemented in conjunction with Arbitrary mesh loading (required for this year), this qualifies as the toggleable bounding volume intersection culling.
- See below for more resources
6️⃣ Wavefront pathtracing: Group rays by material without a sorting pass. A sane implementation will require considerable refactoring, since every supported material suddenly needs its own kernel.
5️⃣ Open Image AI Denoiser or an alternative approve image denoiser Open Image Denoiser is an image denoiser which works by applying a filter on Monte-Carlo-based pathtracer output. The denoiser runs on the CPU and takes in path tracer output from 1spp to beyond. In order to get full credit for this, you must pass in at least one extra buffer along with the raw "beauty" buffer. Ex: Beauty + Normals.
- Part of this extra credit is figuring out where the filter should be called, and how you should manage the data for the filter step.
- It is important to note that integrating this is not as simple as it may seem at first glance. Library integration, buffer creation, device compatibility, and more are all real problems which will appear, and it may be hard to debug them. Please only try this if you have finished the Part 2 early and would like extra points. While this is difficult, the result would be a significantly faster resolution of the path traced image.
5️⃣ Re-startable Path tracing: Save some application state (iteration number, samples so far, acceleration structure) so you can start and stop rendering instead of leaving your computer running for hours at end (which will happen in this project)
5️⃣ Switch the project from using CUDA-OpenGL Interop to using CUDA-Vulkan interop (this is a really great one for those of you interested in doing Vulkan). Talk to TAs if you are planning to pursue this.

Optimization

For those of you that are not as interested in the topic of rendering, we encourage you to focus on optimizing the basic path tracer using GPU programming techniques and more advanced CUDA features. In addition to the core features, we do recommend at least implementing an OBJ mesh loader before focusing on optimization so that you can load in heavy geometries to start seeing performance hit. Please refer to the course materials (especially the CUDA Performance lecture) and the CUDA's Best Practice Guide on how to optimize CUDA performance. Some examples include:

Use shared memory to improve memory bandwidth
Use intrinsinc functions to improve instruction throughput
Use CUDA streams and/or graph for concurrent kernel executions

For each specific optimization technique, please post on Ed Discussion so we can determine the appropriate points to award.

Analysis

For each extra feature, you must provide the following analysis:

Overview write-up of the feature along with before/after images.
Performance impact of the feature
If you did something to accelerate the feature, what did you do and why?
Compare your GPU version of the feature to a HYPOTHETICAL CPU version (you don't have to implement it!)? Does it benefit or suffer from being implemented on the GPU?
How might this feature be optimized beyond your current implementation?

Base Code Tour

You'll be working in the following files. Look for important parts of the code:

Search for CHECKITOUT.
You'll have to implement parts labeled with TODO. (But don't let these constrain you - you have free rein!)

Requirements

Ask in Ed for clarifications.

thrust::default_random_engine rng(hash(index));
thrust::uniform_real_distribution<float> u01(0, 1);
float result = u01(rng);

One meta-goal for this project is to help you gain some experience in reading technical papers and implementing their concepts. This is an important skill in graphics software engineering, and will also be helpful for your final projects.

thrust::default_random_engine rng = makeSeededRandomEngine(iter, index, path.remainingBounces);

Try to look up anything that you don't understand, and feel free to discuss with your fellow students on Ed. We were also able to locate presentation slides for this paper that may be helpful: https://www.highperformancegraphics.org/previous/www_2010/media/RayTracing_I/HPG2010_RayTracing_I_Dammertz.pdf

This paper is also helpful in that it includes a code sample illustrating some of the math, although not all the details are given away - for example, parameter tuning in denoising can be very implementation-dependent.

This project will focus on this paper, however, it may be helpful to read some of the references as well as more recent papers on denoising, such as "Spatiotemporal Variance-Guided Filtering" from NVIDIA, available here: https://research.nvidia.com/publication/2017-07_Spatiotemporal-Variance-Guided-Filtering%3A

Part 2 - A-trous wavelet filter

Implement the A-trous wavelet filter from the paper. 🤷

It's always good to break down techniques into steps that you can individually verify. Such a breakdown for this paper could include:

add UI controls to your project - we've done this for you in this base code, but see Base Code Tour
implement G-Buffers for normals and positions and visualize them to confirm (see Base Code Tour)
implement the A-trous kernel and its iterations without weighting and compare with a a blur applied from, say, GIMP or Photoshop
use the G-Buffers to preserve perceived edges
tune parameters to see if they respond in ways that you expect
test more advanced scenes

Base Code Tour

This base code is derived from Project 3. Some notable differences:

src/pathtrace.cu - we've added functions showGBuffer and showImage to help you visualize G-Buffer info and your denoised results. There's also a generateGBuffer kernel on the first bounce of pathtrace.
src/sceneStructs.h - there's a new GBufferPixel struct
- the term G-buffer is more common in the world of rasterizing APIs like OpenGL or WebGL, where many G-buffers may be needed due to limited pixel channels (RGB, RGBA)
- in CUDA we can pack everything into one G-buffer with comparatively huge pixels.
- at the moment this just contains some dummy "time-to-intersect" data so you can see how showGBuffer works.
src/main.h and src/main.cpp - we've added a bunch of ui_ variables - these connect to the UI sliders in src/preview.cpp, and let you toggle between showGBuffer and showImage, among other things.
scenes - we've added cornell_ceiling_light.txt, which uses a much larger light and fewer iterations. This can be a good scene to start denoising with, since even in the first iteration many rays will terminate at the light.
As usual, be sure to search across the project for CHECKITOUT and TODO

Scene File Format

Note: The Scene File Format and sample scene files are provided as a starting point. You are encouraged to create your own unique scene files, or even modify the scene file format in its entirety. Be sure to document any changes in your readme.

This project uses a custom scene description format. Scene files are flat text files that describe all geometry, materials, lights, cameras, and render settings inside of the scene. Items in the format are delimited by new lines, and comments can be added using C-style // comments.

Note that "acceptably smooth" is somewhat subjective - we will leave the means for image comparison up to you, but image diffing tools may be a good place to start, and can help visually convey differences between two images.

Extra Credit

The following extra credit items are listed roughly in order of level-of-effort, and are just suggestions - if you have an idea for something else you want to add, just ask on Ed!

G-Buffer optimization

When starting out with gbuffers, it's probably easiest to start storing per-pixel positions and normals as glm::vec3s. However, this can be a decent amount of per-pixel data, which must be read from memory.

Implement methods to store positions and normals more compactly. Two places to start include:

storing Z-depth instead of position, and reconstruct position based on pixel coordinates and an inverted projection matrix
oct-encoding normals: http://jcgt.org/published/0003/02/01/paper.pdf

Be sure to provide performance comparison numbers between optimized and unoptimized implementations.

Use of any third-party code must be approved by asking on our Ed Discussion.
If it is approved, all students are welcome to use it. Generally, we approve use of third-party code that is not a core part of the project. For example, for the path tracer, we would approve using a third-party library for loading models, but would not approve copying and pasting a CUDA function for doing refraction.
Third-party code MUST be credited in README.md.
Using third-party code without its approval, including using another student's code, is an academic integrity violation, and will, at minimum, result in you receiving an F for the semester.
You may use third-party 3D models and scenes in your projects. Be sure to provide the right attribution as requested by the creators.

Dammertz-et-al mention in their section 2.2 that A-trous filtering is a means for approximating gaussian filtering. Implement gaussian filtering and compare with A-trous to see if one method is significantly faster. Also note any visual differences in your results.

Shared Memory Filtering

Sell your project.
Assume the reader has a little knowledge of path tracing - don't go into detail explaining what it is. Focus on your project.
Don't talk about it like it's an assignment - don't say what is and isn't "extra" or "extra credit." Talk about what you accomplished.
Use this to document what you've done.
DO NOT leave the README to the last minute!
- It is a crucial part of the project, and we will not be able to grade you without a good README.
- Generating images will take time. Be sure to account for it!

Be sure to provide performance comparison numbers between implementations with and without shared memory. Also pay attention to how shared memory use impacts the block size for your kernels, and how this may change as the filter width changes.

This is a renderer, so include images that you've made!
Be sure to back your claims for optimization with numbers and comparisons.
If you reference any other material, please provide a link to it.
You wil not be graded on how fast your path tracer runs, but getting close to real-time is always nice!
If you have a fast GPU renderer, it is very good to show case this with a video to show interactivity. If you do so, please include a link!

High-performance raytracers in dynamic applications (like games, or real-time visualization engines) now often use temporal sampling, borrowing and repositioning samples from previous frames so that each frame effectively only computes 1 sample-per-pixel but can denoise from many frames.

Stream compaction helps most after a few bounces. Print and plot the effects of stream compaction within a single iteration (i.e. the number of unterminated rays after each bounce) and evaluate the benefits you get from stream compaction.
Compare scenes which are open (like the given cornell box) and closed (i.e. no light can escape the scene). Again, compare the performance effects of stream compaction! Remember, stream compaction only affects rays which terminate, so what might you expect?
For optimizations that target specific kernels, we recommend using stacked bar graphs to convey total execution time and improvements in individual kernels. For example:

Note that our basic pathtracer doesn't do animation, so you will also need to implement some kind of dynamic aspect in your scene - this may be as simple as an automated panning camera, or as complex as translating models.

See https://research.nvidia.com/publication/2017-07_Spatiotemporal-Variance-Guided-Filtering%3A for more details.

Submission

If you have modified any of the CMakeLists.txt files at all (aside from the list of SOURCE_FILES), mentions it explicity.

Beware of any build issues discussed on the Piazza.

Open a GitHub pull request so that we can see that you have finished.

The title should be "Project 3: YOUR NAME".

The template of the comment section of your pull request is attached below, you can do some copy and paste:

Repo Link
(Briefly) Mentions features that you've completed. Especially those bells and whistles you want to highlight
- Feature 0
- Feature 1
- ...
Feedback on the project itself, if any.

References

[PBRT] Physically Based Rendering, Second Edition: From Theory To Implementation. Pharr, Matt and Humphreys, Greg. 2010.
Antialiasing and Raytracing. Chris Cooksey and Paul Bourke, http://paulbourke.net/miscellaneous/aliasing/
Sampling notes from Steve Rotenberg and Matteo Mannino, University of California, San Diego, CSE168: Rendering Algorithms
Path Tracer Readme Samples (non-exhaustive list):

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

INSTRUCTION.md

INSTRUCTION.md

Proj 3 CUDA Path Tracer - Instructions

Building on Project 3 CUDA Path Tracer

Contents

Running the code

Controls

Requirements

Part 1 - Core Features

Part 2 - Make Your Pathtracer Unique!

Visual Improvements

Mesh Improvements

Performance Improvements

Optimization

Analysis

Base Code Tour

Requirements

Part 2 - A-trous wavelet filter

Base Code Tour

Scene File Format

Extra Credit

G-Buffer optimization

Shared Memory Filtering

Submission

References

Files

INSTRUCTION.md

Latest commit

History

INSTRUCTION.md

File metadata and controls

Proj 3 CUDA Path Tracer - Instructions

Building on Project 3 CUDA Path Tracer

Contents

Running the code

Controls

Requirements

Part 1 - Core Features

Part 2 - Make Your Pathtracer Unique!

Visual Improvements

Mesh Improvements

Performance Improvements

Optimization

Analysis

Base Code Tour

Requirements

Part 2 - A-trous wavelet filter

Base Code Tour

Scene File Format

Extra Credit

G-Buffer optimization

Shared Memory Filtering

Submission

References