### Introduction

This article demonstrates basic implementation of a raytracer on a CUDA graphics card. It covers multiple reflections, refractions and anti-aliasing. The implementation is based on PhD Patricio Bulić’s sequential raytracer code.

### Acknowledgements and beforereading

Before I begin the explanation of the workings of the raytracer, first I need to point out that the code base for this algorithm was written by professor Patricio Bulić of the University of Ljubljana, Faculty of Computer and Information Science. The code he presented was a seqential variant of a simple raytracer. Functions for basic raycasting and ray operations were implemented in his code, as well as structures for rays, spheres, intersections and Lambertian and Blinn-Phong shading.

His code and his lecture and presentation in Computer graphics insipred this article. Mr Bulić’s original code is available here.

Before you begin reading this article, you might want to familiarize yourself with the basic concepts of OpenGL, CUDA and raytracing.

Some time ago, I wrote an article that addressed the CUDA – OpenGL Interop: A million particles in CUDA and OpenGL. This might be a good starting point for understanding both OpenGL and CUDA and their interoperability. In that article, there are links to other tutorials and articles written about CUDA and OpenGL Interop. I will assume you read this article and know how vertex buffer object (VBO) works and how CUDA and OpenGL communicate.

At the end of the article, there are links to related tutorials which might help you better understand OpenGL and raytracing. This article is concerned with implementing a parallel raytracer on CUDA.

Raytracing is a very specific topic. The implementation of this raytracer exceeded 900 lines of code. I tried to use less code in this explanation not to draw attention away from the explanations, but the code is provided with the article. Perhaps the best way to understand this article is to open the source code and read through the article and the code in parallel. Try to find and understand each paragraph in the code before proceeding to the next.

### Raytracing basics

Raytracing is a rendering procedure that involves simulating the paths of light in a scene. More precisely, it simulates the paths the light would take from the light source to the eye, but does so starting from the eye.

Raytracing assumes that a viewer is looking at a 3D scene through a window, and the image is rendered onto the window. In a 3D scene, that window is a clipped plane defined in 3D space. It is as if the computer screen is the window placed in the 3D scene, and we are rendering what an eye sees through this window.

For each pixel of the screen, we cast one ray from the eye, through that pixel, into the 3D scene and then follow the ray as it travels through the scene, reflecting and refracting through objects in the scene.

If a ray hits an object in the scene, the lighting of that object is calculated for that hit point. This is done by checking the position of the light, relative to ray’s point of origin and the hit point.

There is a large number of ways to model the coloring of the specific pixel. Some models are more realistic than others, but for this example, we will use Blinn-Phong shading model, as it is sufficiently realistic.

However, this type of light modelling models only the light interaction for a single object and does not consider the reflections and refractions of other objects on the object the ray hit. This is where the idea of raytracing becomes clear. When a ray hits an object, based on the object’s material’s properties, another ray is cast from the hitpoint, further into the scene, and the color of the next hit is added to the pixel the ray was originally cast from. This way, the pixel will be colored with added color of both objects the ray hits. The number of reflections and refractions is determined in the algorithm. Obviously, the more the reflections, more realistic the scene. However, modelling three reflections seems to be sufficient.

If you are not acquainted with the notions of reflection and refraction, you might want to pause here and read more about them. I will, however, remind you that for reflections, the following stands:

*The ray that hits a surface with some angle to the normal (to the surface’s normal vector), reflects at the same angle on the opposite side of the normal.*

The refraction is more complicated, as it takes into consideration the refractive index. The refractive index determines the direction of the refracted ray on the opposite side of the surface it hits. When a ray hits a surface, if the surface is transparent, part of that ray will go through the surface. The refractive index determines what this direction will be. If the refractive index is 1.0, the ray will not change direction.

By combining the Blinn-Phong shading model and raytracing, it is possible to achieve high degrees of realism.

However, there is a limit to how realistic the scene can look, besides the number of reflections and refractions allowed. Lights reflect from the surfaces of other objects and shine onto other objects, thereby making other objects light sources and creating global illumination and light caustics. This is not modelled by raytracing. A very crude approximation of this effect is done through the Blinn-Phong’s ambient component. However, in a real scene, this is far from being that simple. Methods exist to deal with this. Examples of this are photon mapping and global illumination. However, we will not model this here.

### Parallel model

Since we cast a ray for each pixel of the window, this process is trivially parallelizable. Basically, all we need to do to parallelize the raytracer is to have a separate thread for each pixel of the window. CUDA is ideal for this, as it provides kernel invocation that allows calling the same function in a grid of threads.

I assume you know the basics of CUDA, so I will not explain how the kernels (CUDA functions that can be launched from host) work. I will only point out that the same kernel is launched in a grid of threads grouped in blocks. The x and y indices of the thread determine the pixel for which the thread is calculating color.

To render the image faster, we will use a vertex buffer object. VBO is, essentially, an array of values that OpenGL uses to draw vertices on the screen. Two float values in the buffer define the position of the vertex on the screen, while four bytes define the color of that specific pixel (the color could have been defined with three bytes, without the alpha component).

glBindBuffer(GL_ARRAY_BUFFER, vbo); glVertexPointer(2, GL_FLOAT, 12, 0); glColorPointer(4,GL_UNSIGNED_BYTE,12,(GLvoid*)8); glEnableClientState(GL_VERTEX_ARRAY); glEnableClientState(GL_COLOR_ARRAY); glDrawArrays(GL_POINTS, 0, WINDOW * WINDOW); glDisableClientState(GL_VERTEX_ARRAY);

The raytracing kernel should take as a parameter the VBO and write the appropriate pixel colors in its corresponding place in the VBO.

Since we need to cast a new ray (or two rays) when each ray hits an object, it is necessary to write recursive device functions for casting rays (each function should call either `reflection()`

or `refraction()`

for hits of all rays it casts). For this, you will need an Nvidia card with compute capability 2.0, because only these cards’ architectures support recursive device functions.

I am not sure why, but it seems that it isn’t possible to create two mutually recursive functions in CUDA. This is why our example will not have reflections that consider refractions and recursive refractions. Maybe it’s a bug in my code (if you find it, please e-mail me), but it seems not to be possible.

### Initialization and preparation of the scene

We are going to create a scene with four spheres, each with specific properties. The largest sphere will be at the bottom, providing the reflective surface for the other three spheres.

We need to define the material for each sphere, so we will use a struct that contains all the necessary information for a material. All the components of the Blinn-Phong model and transparency.

Also, it is necessary to define structures for calculating rays and intersections. We define structures SPHERE_INTERSECTION, RAY, VECTOR3D and VEC_BASIS, which are used for mathematical calculations necessary for the process of raytracing. I realize that the VECTOR3D is a redefinition, and that float3 could have been used instead, but I choose to retain professor Bulić’s original definition.

struct SPHERE_INTERSECTION { double lambda_in; double lambda_out; VECTOR3D normal; VECTOR3D point; bool valid; } ; struct SPHERE { VECTOR3D center; double radius; double kd_rgb[3]; double ks_rgb[3]; double ka_rgb[3]; double kr_rgb[3]; double refraction_index; double shininess; bool mirror; };

Professor Bulić defined vector operations on this structure, which is very convenient. Only modification I made to these functions is the` __device__`

prefix, necessary for the execution of these functions from kernels on the device. These are the functions for adding, subtracting, scaling and normalizing the vectors and the dot product of the vectors.

I added the cross product function. The cross product function is the result of a determinant you can find in the Wikipedia article for the cross product.

Professor Bulić wrote other raytracing functions: compute_ray, compute_reflected_ray, sphere_intersection and intersection_normal. These functions are used to calculate the rays’ trajectories and intersections and are essential for the raytracer. I will explain these later.

We need to initialize all the device variables, so we need to write a CUDA kernel which we will run once before we start our raytracers. In the code, I named this kernel init_kernel.

The initialization steps are as follows:

- Initialize GLUT and GLEW (if you need to setup your IDE to use OpenGL, you might want to take a look at these articles: CUDA development on Ubuntu and GPU Parallel Programming in VS2012 with NVIDIA CUDA)
- Initialize OpenGL
- Run
`init_kernel`

on a single thread to initialize the device variables (the`init_kernel`

in our code is run from the`init()`

function) - Set the scene parameters (base vectors, viewport, camera)
- Position the spheres and assign materials for each sphere
- Set the callback function for display
- Create the VBO (
`void createVBO(GLuint* vbo)`

)

### The display function

The display function is the function called for rendering of each frame. This is where the raytracing will be invoked from.

First, we call a special kernel, we named `animate_kernel`

, which modifies the positions of the spheres for each frame, so that we have a simple animation.

Next, we map the VBO to a float3 pointer. We clear the screen and run our `rayTrace_kernel`

. After the raytracing is done (don’t worry, I’ll explain this later :D) we run the anti-aliasing on the VBO.

The anti-aliasing is done in three steps, as we are using supersampling:

- We expand the size of the image by using our
`inflate_kernel`

(each pixel from the VBO is copied into a number of pixels in the expanded image) - The
`antiAlias_kernel`

is run which is basically a low-pass filter (blur effect done with a three-by-three kernel – here the kernel refers to a matrix and not a CUDA kernel) - We run the
`deflate_kernel`

to restore the image to the VBO (in its original size).

The anti-aliasing allows us to smoothen the edges so that the scene looks more realistic.

### Raytracing

Since we ran the `rayTrace_kernel`

in a grid of threads, each kernel needs to know which pixel it is calculating the ray for. We obtain the pixel from the thread’s index in the grid in the standard way.

unsigned int i = blockIdx.x*blockDim.x + threadIdx.x; unsigned int j = blockIdx.y*blockDim.y + threadIdx.y; if (i>=(viewport.xvmax - viewport.xvmin) || j>(viewport.yvmax - viewport.yvmin)) return;

We run the kernel in a grid of threads, dimensions of which we derive from the size of the window.

dim3 block(32, 16, 1); dim3 grid(WINDOW/block.x, WINDOW/block.y, 1); rayTrace_kernel<<<grid,block>>>(dptr); HANDLE_ERROR(cudaGetLastError());

Now we can look at the code sequentially as the bare invocation of the kernels in the grid and the above step allowed the parallelism.

First, we compute the ray from the current pixel by calling the `compute_ray()`

function. This function will return a single ray, which is a struct with two properties: origin and direction. First ray’s origin is the pixel on the window, while its direction is determined by solving an equation of a line through two points (in this case, the pixel’s position and the position of the eye, named `view_point`

in our example).

Next, for each sphere in the scene, we determine whether the ray we cast intersects them. The `sphere_intersection()`

function does this. It takes the ray and sphere as parameters and returns the sphere intersection. As you can see, the `SPHERE_INTERSECTION`

struct has two parameters `lambda_in`

and `lambda_out`

. When a ray intersects the sphere it has an entry and exit point (these points can be obtained by solving a quadratic equation, proof of which I will not present here, but you can take a look at the Wikipedia article about line-sphere intersection). The `lambda_in`

parameter corresponds to the point closer to the window and the `lambda_out`

to the point further from the window (note that I am using the term window in non-graphical sense – by window I mean the matrix of pixels we are projecting the image at). Besides that, the `lambda_out`

parameter represents the location where a ray would exit the sphere if the ray were cast through it.

From all the intersections, we find the one closest to the ray’s origin.

// 1. compute ray: compute_ray(&ray, &view_point, &viewport, &pixel, &camera_frame, focal_distance); // 2. check if ray hits an object: for (int k=0; k<NSPHERES; k++) { if (sphere_intersection(&ray, &sphere[k], &intersection)) { intersection_normal(&sphere[k], &intersection, &ray); if (intersection.lambda_in<current_lambda) { current_lambda=intersection.lambda_in; intersection_object=k; current_intersection=intersection; } } } // 3. Compute the color of the pixel: if (intersection_object > -1) { ... }

Here is where we call our recursive functions. From the intersection we found, we want to cast another two rays: a reflected one and a refracted one. Here we call the `calculateReflection()`

and `calculateRefraction()`

functions and give them the current pixel’s color, current ray and current intersection as parameters. These functions will do the same thing our kernel is doing. They will cast rays from the intersection point further into the scene. However, when calling themselves, these functions will decrement a counter, which will allow the recursion to terminate. Each of these functions will modify the colors they were given as parameters. If a reflected ray hits another sphere, its color will be added to the color passed as the parameter, but its intensity will be scaled by the amount of reflectiveness of the sphere the ray was reflected from.

For calculating the reflection, we do basically the same process as for the first intersection, but we need to calculate the normal in the point of intersection, as well as the reflected ray (we do this by using the `compute_reflected_ray`

() and `intersection_normal()`

functions).

The refractions are somewhat more complicated, as we need to rotate the ray by an angle determined by the angle of entry and the refractive index of the sphere. The ray refracts two times for a single sphere, so we need to calculate two refractions: one on entry and the other one on exit. Once we determine the rotation angle for a ray, we can use a rotation matrix to rotate the vector. A cross product of the incoming ray and the normal in the point of intersection can provide an axis for rotation. We rotate the incoming ray around this axis by the angle we calculated and we have obtained the refracted ray. However, we need to do this twice, so we check this new ray’s intersection with the current sphere and use the `intersection_exit_normal`

to calculate the position and direction of the ray exiting the sphere. For simplicity, we will not consider the reflections within the spheres.

After the reflections and refractions, the pixel should be assigned the color, which is a sum of the color of the first sphere the ray intersected and colors of all the spheres the reflected and refracted rays intersected. Each of these colors need to be calculated in an appropriate way. If a point is in shadow, then only diffuse component can be used, otherwise, the Blinn-Phong shading can be used.

### Conclusion

This article is meant to provide basic understanding of how raytracing works. It would be very difficult to write a complete tutorial for such a complicated procedure, since it demands a solid understanding of analytic geometry, math and programming.

To fully understand raytracing it is necessary to implement it from start. I have tried to link to other articles, so that the readers have a better opportunity to learn. As I said in the introduction, I deliberately used less code in the text, as it would draw attention away from the explanation.

Professor Patricio Bulić provided the sequential raytracer code as well as most of the mathematical functions required for calculating rays and intersections, so this work is not entirely my own and would not be possible without his excellent presentation.

His original code and my code are provided below.

### Source code

The links for the source code:

Professor Bulić’s sequential raytracer code

My CUDA parallel raytracer code

### Related articles

There are many other articles that could help you in understanding CUDA, OpenGL and raytracing.

- http://www.igorsevo.com/Article.aspx?article=Million+particles+in+CUDA+and+OpenGL
- http://www.codermind.com/articles/Raytracer-in-C++-Introduction-What-is-ray-tracing.html
- http://www.siggraph.org/education/materials/HyperGraph/raytrace/rtrace0.htm
- http://www.scratchapixel.com/lessons/3d-basic-lessons/lesson-1-writing-a-simple-raytracer/

*17.6.2013*