diff --git a/README.html b/README.html new file mode 100644 index 0000000..1a7bb74 --- /dev/null +++ b/README.html @@ -0,0 +1,1072 @@ +README

CUDA Rasterizer

+

CLICK ME FOR INSTRUCTION OF THIS PROJECT

+

University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 4

+ +

README

+

Credits

+
\ No newline at end of file diff --git a/README.md b/README.md index 41b91f0..de9e5c1 100644 --- a/README.md +++ b/README.md @@ -5,16 +5,85 @@ CUDA Rasterizer **University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 4** -* (TODO) YOUR NAME HERE - * (TODO) [LinkedIn](), [personal website](), [twitter](), etc. -* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab) +* Emily Vo + * [LinkedIn](linkedin.com/in/emilyvo), [personal website](emilyhvo.com) +* Tested on: Windows 10, i7-7700HQ @ 2.8GHz 16GB, GTX 1060 6GB (Personal Computer) +Updated the CMakeLists.txt to sm_61. -### (TODO: Your README) +### Blinn +![](img/blinn_normals.gif) -*DO NOT* leave the README to the last minute! It is a crucial part of the -project, and we will not be able to grade you without a good README. +### Lambert +![](img/lambert_normals.gif) +### README +This project is a rasterizer implemented in CUDA. A typical rasterization pipeline was implemented. First, the vertex attributes were computed and assembled (e.g. eye position, eye normal, texture coordinates, and texture properties). The primitives were also assembled. The primitives are then passed to the rasterization step, where the program can choose to rasterize triangles, points, or lines. The fragment's normal, eye points and UVs are determined through barycentric interpolation. The fragment color can be read from a texture in the rasterize step using the newly interpolated UVs. Finally, The fragments are passed to the render step, where two types of shaders are implemented - Blinn and Lambert. + +#### Features +##### Point Cloud + +![](img/pointcloud.png) + +To render a point cloud, I used the NDC x and y coordinates. I tested if they were in the bounds of the screen, and then simply colored the fragment with a normal. + +##### Wireframe Rendering + +![](img/linerender.PNG) + +For each line, I iterated over the x values from x-min to x-max and then solve for y in each iteration using the point-slope formula. If the x and y points are both in the screen, then I color the fragment with the normals. + +##### Triangle Rendering + +![](img/blinn.png) + +![](img/lambert.png) + +To render the triangle, I iterate over the bounding box of triangles, and test if the point is inside the triangle using barycentric coordinates. I color the fragment if it is in the bounds of the triangle. + +##### Texture Mapping +For vanilla texture mapping, I simply converted the UVs to texture space and converted those values to indices within the flat texture array. + +In general, reading from textures are slower because you can sometimes experience cache misses when attempting to fetch a texture value. When you have a cache miss, you end up wasting more cycles trying to search for the value in physical memory. + +##### Bilinear Interpolation +Bilinear interpolation is a basic resampling technique. When reading from the texture, I sampled 4 texture coordinates that surround the uv coordinates. I then passed in the distance to these corner texture coordinates as the t-value to interpolation functions. In bilinear interpolation, you first interpolate across the x-direction at the upper bounds and lound bounds of the square domain you're interpolating. You then interpolate acros the y-direction using the two interpolated x-values to get one final scalar value. In general, you get smoother looking textures. + +Because you end up reading from a texture 4 times per fragment, you get a much more expensive computation. + + +![](https://demofox2.files.wordpress.com/2015/04/bilinear.png?w=534&zoom=2) + +Source: https://blog.demofox.org/2015/04/30/bilinear-filtering-bilinear-interpolation/ + +##### Perspective Correct UVs + +![](img/NoPC.png) + +In the case of a checkerboard plane, it is easy to see that without perspective correct UVs, the appearance of the plane is extremely distorted. + +![](img/PC.png) + +With perspective correct UVs, the texture now appears correct with foreshortening. + +#### Performance Analysis + +![](img/FPS_vs_primitive.png) + +This graph shows rendering triangles is most expensive, and rendering lines is less expensive, and rendering the points is the least expensive. It makes sense that it would be slower when you need to rasterize every pixel in the triangle for each frame, versus when you only need to shade a few points. + +![](img/FPS_vs_shader.png) + +Flat shading is not expensive. Lambert is more expensive because lambert's coefficient must be computed every frame. Blinn is more expensive because lambert's law must be computed as well as the specular component, and their contributions are weighted in the final pixel color. + +![](img/FPS_vs_texturing.png) + +The first bar is vanilla texturing. The next is bilinear texturing, and the last is bilinear with perspective correct. Bilinear is expensive because you need to sample the texture 4 times per fragment, increasing the chances of a cache miss. + +![](img/featuresTime.png) + +In this graph, each part of the pipeline is timed with different features. Throughout the graph, vertex and primitive assembly remains the same, except for when texture attributes need to be set. That explains why the vertex and primitive assembly slows down a little bit when textures are turned on. Rasterization varies the most between each feature. Rasterization experiences a huge slowdown when triangles are turned down, and another slowdown when texture mapping is turned down. Rasterization with bilinear becomes extremely slow, probably from the increased likelihood of cache misses. Fortunately, perspective correction is not as expensive to add on top of texturing and is a huge benefit to the appearance of the renders, since it is just a handful of additional mathematical operations. + ### Credits * [tinygltfloader](https://github.com/syoyo/tinygltfloader) by [@soyoyo](https://github.com/syoyo) diff --git a/img/BL.png b/img/BL.png new file mode 100644 index 0000000..5a14b75 Binary files /dev/null and b/img/BL.png differ diff --git a/img/FPS_vs_primitive.png b/img/FPS_vs_primitive.png new file mode 100644 index 0000000..ced3443 Binary files /dev/null and b/img/FPS_vs_primitive.png differ diff --git a/img/FPS_vs_shader.png b/img/FPS_vs_shader.png new file mode 100644 index 0000000..d7b9ad4 Binary files /dev/null and b/img/FPS_vs_shader.png differ diff --git a/img/FPS_vs_texturing.png b/img/FPS_vs_texturing.png new file mode 100644 index 0000000..24ffc3c Binary files /dev/null and b/img/FPS_vs_texturing.png differ diff --git a/img/NoPC.png b/img/NoPC.png new file mode 100644 index 0000000..bd0ef98 Binary files /dev/null and b/img/NoPC.png differ diff --git a/img/PC.png b/img/PC.png new file mode 100644 index 0000000..1798d26 Binary files /dev/null and b/img/PC.png differ diff --git a/img/blinn.png b/img/blinn.png new file mode 100644 index 0000000..a7056fa Binary files /dev/null and b/img/blinn.png differ diff --git a/img/blinn_normals.gif b/img/blinn_normals.gif new file mode 100644 index 0000000..54e7f10 Binary files /dev/null and b/img/blinn_normals.gif differ diff --git a/img/featuresTime.png b/img/featuresTime.png new file mode 100644 index 0000000..1f52e12 Binary files /dev/null and b/img/featuresTime.png differ diff --git a/img/lambert.png b/img/lambert.png new file mode 100644 index 0000000..952ccf9 Binary files /dev/null and b/img/lambert.png differ diff --git a/img/lambert_normals.gif b/img/lambert_normals.gif new file mode 100644 index 0000000..6ca5575 Binary files /dev/null and b/img/lambert_normals.gif differ diff --git a/img/linerender.PNG b/img/linerender.PNG new file mode 100644 index 0000000..36a8041 Binary files /dev/null and b/img/linerender.PNG differ diff --git a/img/pointcloud.png b/img/pointcloud.png new file mode 100644 index 0000000..56b243f Binary files /dev/null and b/img/pointcloud.png differ diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt index a57f69f..00edee0 100644 --- a/src/CMakeLists.txt +++ b/src/CMakeLists.txt @@ -6,5 +6,5 @@ set(SOURCE_FILES cuda_add_library(src ${SOURCE_FILES} - OPTIONS -arch=sm_20 + OPTIONS -arch=sm_61 ) diff --git a/src/common.cu b/src/common.cu new file mode 100644 index 0000000..2ed6d63 --- /dev/null +++ b/src/common.cu @@ -0,0 +1,39 @@ +#include "common.h" + +void checkCUDAErrorFn(const char *msg, const char *file, int line) { + cudaError_t err = cudaGetLastError(); + if (cudaSuccess == err) { + return; + } + + fprintf(stderr, "CUDA error"); + if (file) { + fprintf(stderr, " (%s:%d)", file, line); + } + fprintf(stderr, ": %s: %s\n", msg, cudaGetErrorString(err)); + exit(EXIT_FAILURE); +} + + +namespace StreamCompaction { + namespace Common { + + /** + * Maps an array to an array of 0s and 1s for stream compaction. Elements + * which map to 0 will be removed, and elements which map to 1 will be kept. + */ + __global__ void kernMapToBoolean(int n, int *bools, const int *idata) { + // TODO + } + + /** + * Performs scatter on an array. That is, for each element in idata, + * if bools[idx] == 1, it copies idata[idx] to odata[indices[idx]]. + */ + __global__ void kernScatter(int n, int *odata, + const int *idata, const int *bools, const int *indices) { + // TODO + } + + } +} diff --git a/src/common.h b/src/common.h new file mode 100644 index 0000000..996997e --- /dev/null +++ b/src/common.h @@ -0,0 +1,132 @@ +#pragma once + +#include +#include + +#include +#include +#include +#include +#include +#include + +#define FILENAME (strrchr(__FILE__, '/') ? strrchr(__FILE__, '/') + 1 : __FILE__) +#define checkCUDAError(msg) checkCUDAErrorFn(msg, FILENAME, __LINE__) + +/** + * Check for CUDA errors; print and exit if there was a problem. + */ +void checkCUDAErrorFn(const char *msg, const char *file = NULL, int line = -1); + +inline int ilog2(int x) { + int lg = 0; + while (x >>= 1) { + ++lg; + } + return lg; +} + +inline int ilog2ceil(int x) { + return x == 1 ? 0 : ilog2(x - 1) + 1; +} + +namespace StreamCompaction { + namespace Common { + __global__ void kernMapToBoolean(int n, int *bools, const int *idata); + + __global__ void kernScatter(int n, int *odata, + const int *idata, const int *bools, const int *indices); + + /** + * This class is used for timing the performance + * Uncopyable and unmovable + * + * Adapted from WindyDarian(https://github.com/WindyDarian) + */ + class PerformanceTimer + { + public: + PerformanceTimer() + { + cudaEventCreate(&event_start); + cudaEventCreate(&event_end); + } + + ~PerformanceTimer() + { + cudaEventDestroy(event_start); + cudaEventDestroy(event_end); + } + + void startCpuTimer() + { + if (cpu_timer_started) { throw std::runtime_error("CPU timer already started"); } + cpu_timer_started = true; + + time_start_cpu = std::chrono::high_resolution_clock::now(); + } + + void endCpuTimer() + { + time_end_cpu = std::chrono::high_resolution_clock::now(); + + if (!cpu_timer_started) { throw std::runtime_error("CPU timer not started"); } + + std::chrono::duration duro = time_end_cpu - time_start_cpu; + prev_elapsed_time_cpu_milliseconds = + static_cast(duro.count()); + + cpu_timer_started = false; + } + + void startGpuTimer() + { + if (gpu_timer_started) { throw std::runtime_error("GPU timer already started"); } + gpu_timer_started = true; + + cudaEventRecord(event_start); + } + + void endGpuTimer() + { + cudaEventRecord(event_end); + cudaEventSynchronize(event_end); + + if (!gpu_timer_started) { throw std::runtime_error("GPU timer not started"); } + + cudaEventElapsedTime(&prev_elapsed_time_gpu_milliseconds, event_start, event_end); + gpu_timer_started = false; + } + + float getCpuElapsedTimeForPreviousOperation() //noexcept //(damn I need VS 2015 + { + return prev_elapsed_time_cpu_milliseconds; + } + + float getGpuElapsedTimeForPreviousOperation() //noexcept + { + return prev_elapsed_time_gpu_milliseconds; + } + + // remove copy and move functions + PerformanceTimer(const PerformanceTimer&) = delete; + PerformanceTimer(PerformanceTimer&&) = delete; + PerformanceTimer& operator=(const PerformanceTimer&) = delete; + PerformanceTimer& operator=(PerformanceTimer&&) = delete; + + private: + cudaEvent_t event_start = nullptr; + cudaEvent_t event_end = nullptr; + + using time_point_t = std::chrono::high_resolution_clock::time_point; + time_point_t time_start_cpu; + time_point_t time_end_cpu; + + bool cpu_timer_started = false; + bool gpu_timer_started = false; + + float prev_elapsed_time_cpu_milliseconds = 0.f; + float prev_elapsed_time_gpu_milliseconds = 0.f; + }; + } +} diff --git a/src/rasterize.cu b/src/rasterize.cu index 1262a09..4689c8f 100644 --- a/src/rasterize.cu +++ b/src/rasterize.cu @@ -17,7 +17,23 @@ #include "rasterize.h" #include #include +#include +#define LAMBERT 1 +#define BLINN 0 + +#define POINT 0 +#define TRI 0 +#define LINE 1 + +#define TEXTURE 1 +#define BILINEAR 1 + +#define PERSP_CORRECT 1 + +#define BACKCULL 0 + +#define TIMER 0 namespace { typedef unsigned short VertexIndex; @@ -46,7 +62,9 @@ namespace { // glm::vec3 col; glm::vec2 texcoord0; TextureData* dev_diffuseTex = NULL; - // int texWidth, texHeight; + glm::vec3 camPos; + int texWidth, texHeight; + glm::vec3 mvpPos; // ... }; @@ -62,10 +80,11 @@ namespace { // The attributes listed below might be useful, // but always feel free to modify on your own - // glm::vec3 eyePos; // eye space position used for shading - // glm::vec3 eyeNor; - // VertexAttributeTexcoord texcoord0; - // TextureData* dev_diffuseTex; + glm::vec3 eyePos; // eye space position used for shading + glm::vec3 eyeNor; + glm::vec3 camPos; + VertexAttributeTexcoord texcoord0; + TextureData* dev_diffuseTex; // ... }; @@ -109,8 +128,8 @@ static Primitive *dev_primitives = NULL; static Fragment *dev_fragmentBuffer = NULL; static glm::vec3 *dev_framebuffer = NULL; -static int * dev_depth = NULL; // you might need this buffer when doing depth test - +static float * dev_depth = NULL; // you might need this buffer when doing depth test +static int * dev_mutex = NULL; /** * Kernel that writes the image to the OpenGL PBO directly. */ @@ -143,10 +162,42 @@ void render(int w, int h, Fragment *fragmentBuffer, glm::vec3 *framebuffer) { int index = x + (y * w); if (x < w && y < h) { - framebuffer[index] = fragmentBuffer[index].color; + // TODO: add your fragment shader code here - + Fragment fragment = fragmentBuffer[index]; +#if POINT || LINE + framebuffer[index] = fragment.color; +#elif LAMBERT + glm::vec3 v = fragment.eyePos; + glm::vec3 n = fragment.eyeNor; + glm::vec3 fragColor(1, 0, 0); + glm::vec3 lightPos = glm::vec3(2, 2, 2) + v; + glm::vec3 L = glm::normalize(lightPos - v); + float lambert = glm::max(0.f, glm::dot(L, n)); + glm::vec3 ambient = glm::vec3(0.1) * fragment.color; + framebuffer[index] = ambient + 0.9f * lambert * fragment.color; +#elif BLINN + glm::vec3 lights[2] = { glm::vec3(2, 2, 2) }; + framebuffer[index] = glm::vec3(0.0f); + for (glm::vec3 lightPos : lights) { + glm::vec3 v = fragment.eyePos; + glm::vec3 n = glm::normalize(fragment.eyeNor); + glm::vec3 camPos = fragment.camPos; + glm::vec3 L = lightPos; + glm::vec3 Ev = glm::normalize(-v); + glm::vec3 R = glm::normalize(-glm::reflect(v - camPos, n)); + glm::vec3 ambient = glm::vec3(0.1) * fragment.color; + float specular = glm::pow(glm::max(glm::dot(R, Ev), 0.f), 32.f); + float lambert = glm::max(0.f, glm::dot(L, n)); + glm::vec3 diffuse = lambert * fragment.color; + framebuffer[index] += glm::clamp(ambient + diffuse*glm::vec3(0.7) + + specular * glm::vec3(0.2), glm::vec3(0), glm::vec3(1)); + } + +#else + framebuffer[index] = fragment.color; +#endif } } @@ -164,13 +215,16 @@ void rasterizeInit(int w, int h) { cudaMemset(dev_framebuffer, 0, width * height * sizeof(glm::vec3)); cudaFree(dev_depth); - cudaMalloc(&dev_depth, width * height * sizeof(int)); + cudaMalloc(&dev_depth, width * height * sizeof(float)); + cudaFree(dev_mutex); + cudaMalloc(&dev_mutex, width * height * sizeof(int)); + cudaMemset(dev_mutex, 0, width * height * sizeof(int)); checkCUDAError("rasterizeInit"); } __global__ -void initDepth(int w, int h, int * depth) +void initDepth(int w, int h, float * depth) { int x = (blockIdx.x * blockDim.x) + threadIdx.x; int y = (blockIdx.y * blockDim.y) + threadIdx.y; @@ -183,6 +237,7 @@ void initDepth(int w, int h, int * depth) } + /** * kern function with support for stride to sometimes replace cudaMemcpy * One thread is responsible for copying one component @@ -294,6 +349,189 @@ void traverseNode ( } } + +__forceinline__ __device__ glm::vec3 baryPos(VertexOut *v, glm::vec3 barycentric) { + VertexOut vert0, vert1, vert2; + vert0 = v[0]; + vert1 = v[1]; + vert2 = v[2]; + glm::vec3 p0 = vert0.mvpPos * barycentric[0]; + glm::vec3 p1 = vert1.mvpPos * barycentric[1]; + glm::vec3 p2 = vert2.mvpPos * barycentric[2]; + return p0 + p1 + p2; +} + +__forceinline__ __device__ glm::vec3 baryNorm(VertexOut *v, glm::vec3 barycentric) { + VertexOut vert0, vert1, vert2; + vert0 = v[0]; + vert1 = v[1]; + vert2 = v[2]; + glm::vec3 p0 = vert0.eyeNor * barycentric[0]; + glm::vec3 p1 = vert1.eyeNor * barycentric[1]; + glm::vec3 p2 = vert2.eyeNor * barycentric[2]; + return p0 + p1 + p2; +} + + +__forceinline__ __device__ glm::vec2 baryUVs(VertexOut *v, glm::vec3 barycentric) { + VertexOut vert0, vert1, vert2; + vert0 = v[0]; + vert1 = v[1]; + vert2 = v[2]; + glm::vec2 p0 = vert0.texcoord0 * barycentric[0]; + glm::vec2 p1 = vert1.texcoord0 * barycentric[1]; + glm::vec2 p2 = vert2.texcoord0 * barycentric[2]; + return p0 + p1 + p2; +} + + +__forceinline__ __device__ glm::vec2 baryUVsPerspective(VertexOut *v, glm::vec3 barycentric) { + VertexOut v0, v1, v2; + v0 = v[0]; + v1 = v[1]; + v2 = v[2]; + + glm::vec2 texCoordZ = barycentric.x * (v0.texcoord0 / v0.eyePos.z) + barycentric.y * (v1.texcoord0 / v1.eyePos.z) + barycentric.z * (v2.texcoord0 / v2.eyePos.z); + float coordZ = barycentric.x * (1.0f / v0.eyePos.z) + barycentric.y * (1.0f / v1.eyePos.z) + barycentric.z * (1.0f / v2.eyePos.z); + return texCoordZ / coordZ; +} + +__device__ +glm::vec3 getColor(TextureData* tex, int width, float x, float y) { + int i = x + y * width; + return glm::vec3(tex[i * 3], tex[i * 3 + 1], tex[i * 3 + 2]) / 255.f; +} + +__global__ void kernelRasterize(int totalNumPrimitives, Primitive *dev_primitives, Fragment *dev_fragmentBuffer, float *dev_depth, int *dev_mutex, int width, int height) { + int pid = (blockIdx.x * blockDim.x) + threadIdx.x; + if (pid > totalNumPrimitives) return; + // compute triangle from primitive + Primitive primitive = dev_primitives[pid]; + glm::vec3 v0, v1, v2; + v0 = glm::vec3(primitive.v[0].pos); + v1 = glm::vec3(primitive.v[1].pos); + v2 = glm::vec3(primitive.v[2].pos); + glm::vec3 triangle[3] = { v0, v1, v2 }; +#if BACKCULL + if (glm::dot(primitive.v->eyeNor, primitive.v->camPos - primitive.v->eyePos) < -1.f) { + return; + } +#endif + +#if POINT +int x, y; +for (int i = 0; i < 3; i++) { + x = triangle[i].x; + y = triangle[i].y; + int fragmentId = x + y * width; + if ((x >= 0 && x <= width - 1) && (y >= 0 && y <= height - 1)) { + dev_fragmentBuffer[fragmentId].color = primitive.v->eyeNor; + } +} +#elif LINE +for (int i = 0; i < 3; i++) { + int x1 = triangle[i].x; + int x2 = triangle[i + 1].x; + int y1 = triangle[i].y; + int y2 = triangle[i + 1].y; + int dx = x2 - x1; + int dy = y2 - y1; + for (int x = x1; x <= x2; x++) { + int y = y1 + dy * (x - x1) / dx; + int fragmentId = x + y * width; + if (x < 0 || x >= width) continue; + if (y < 0 || y >= height) continue; + dev_fragmentBuffer[fragmentId].color = primitive.v->eyeNor; + } +} +#elif TRI + // compute bounding box and clip to screen + AABB boundingBox = getAABBForTriangle(triangle); + const int minX = glm::min(width - 1, glm::max(0, (int)boundingBox.min.x)); + const int minY = glm::min(height - 1, glm::max(0, (int)boundingBox.min.y)); + const int maxX = glm::min(width - 1, glm::max(0, (int)boundingBox.max.x)); + const int maxY = glm::min(height - 1, glm::max(0, (int)boundingBox.max.y)); + + // iterate over bounding box and test which pixels are inside + for (int x = minX; x <= maxX; ++x) { + for (int y = minY; y <= maxY; ++y) { + glm::vec3 barycentric = calculateBarycentricCoordinate(triangle, glm::vec2(x, y)); + bool inTriangle = isBarycentricCoordInBounds(barycentric); + + if (inTriangle) { + const int fragmentId = x + (y * width); + + bool isSet; + do { + // it was unlocked so we lock it + isSet = atomicCAS(&dev_mutex[fragmentId], 0, 1) == 0; + if (isSet) { + float depth = -getZAtCoordinate(barycentric, triangle) * INT_MAX; + + // if this fragment is closer, we set the new depth and fragment + if (depth < dev_depth[fragmentId]) { + dev_depth[fragmentId] = depth; + Fragment &fragment = dev_fragmentBuffer[fragmentId]; + fragment.eyeNor = baryNorm(primitive.v, barycentric); + fragment.eyePos = baryPos(primitive.v, barycentric); + fragment.color = fragment.eyeNor; + fragment.color = glm::vec3(1.f); + fragment.camPos = primitive.v[0].camPos; +#if PERSP_CORRECT + glm::vec2 uvs = baryUVsPerspective(primitive.v, barycentric); + fragment.texcoord0 = uvs; + auto v0 = primitive.v[0]; + auto v1 = primitive.v[1]; + auto v2 = primitive.v[2]; + glm::vec2 texCoordZ = barycentric.x * (v0.texcoord0 / v0.eyePos.z) + barycentric.y * (v1.texcoord0 / v1.eyePos.z) + barycentric.z * (v2.texcoord0 / v2.eyePos.z); + float coordZ = barycentric.x * (1.0f / v0.eyePos.z) + barycentric.y * (1.0f / v1.eyePos.z) + barycentric.z * (1.0f / v2.eyePos.z); + fragment.texcoord0 = texCoordZ / coordZ; +#else + glm::vec2 uvs = baryUVs(primitive.v, barycentric); + fragment.texcoord0 = uvs; +#endif + +#if BILINEAR + if (primitive.v->dev_diffuseTex != NULL) { + float x = uvs[0] * primitive.v->texWidth; + float y = uvs[1] * primitive.v->texHeight; + int xx = glm::floor(x); + int yy = glm::floor(y); + float xfract = x - xx; + float yfract = y - yy; + float xinv = 1.f - xfract; + float yinv = 1.f - yfract; + + TextureData *text = primitive.v->dev_diffuseTex; + int width = primitive.v->texWidth; + glm::vec3 tex00 = getColor(text, width, xx, yy); + glm::vec3 tex10 = getColor(text, width, xx + 1, yy); + glm::vec3 tex01 = getColor(text, width, xx, yy + 1); + glm::vec3 tex11 = getColor(text, width, xx + 1, yy + 1); + + fragment.color = (tex00 * xinv + tex10 * xfract) * yinv + (tex01 * xinv + tex11 * xfract) * yfract; + } +#elif TEXTURE + if (primitive.v->dev_diffuseTex != NULL) { + float x = fragment.texcoord0[0] * primitive.v->texWidth; + float y = fragment.texcoord0[1] * primitive.v->texHeight; + TextureData *text = primitive.v->dev_diffuseTex; + int width = primitive.v->texWidth; + fragment.color = getColor(text, width, glm::floor(x), glm::floor(y)); + } +#else + //fragment.color = baryNorm(primitive.v, barycentric); +#endif + } + dev_mutex[fragmentId] = 0; + } + } while (!isSet); + } + } + } +#endif +} + void rasterizeSetBuffers(const tinygltf::Scene & scene) { totalNumPrimitives = 0; @@ -327,7 +565,7 @@ void rasterizeSetBuffers(const tinygltf::Scene & scene) { } } - + // 2. for each mesh: // for each primitive: @@ -638,10 +876,27 @@ void _vertexTransformAndAssembly( // Multiply the MVP matrix for each vertex position, this will transform everything into clipping space // Then divide the pos by its w element to transform into NDC space // Finally transform x and y to viewport space + glm::vec4 objPos(primitive.dev_position[vid], 1.f); + glm::vec4 eyePos = MVP * objPos; + glm::vec4 mvpPos = eyePos; + mvpPos /= mvpPos.w; + mvpPos.x = 0.5f * float(width) * (mvpPos.x + 1.f); + mvpPos.y = 0.5f * float(height) * (1.f - mvpPos.y); + mvpPos.z = -mvpPos.z; // TODO: Apply vertex assembly here // Assemble all attribute arraies into the primitive array - + VertexOut &vo = primitive.dev_verticesOut[vid]; + vo.pos = mvpPos; + + vo.eyePos = glm::vec3(MV * objPos);//glm::vec3(eyePos[0], eyePos[1], eyePos[2]); + vo.mvpPos = glm::vec3(MVP * objPos); + vo.eyeNor = glm::normalize(MV_normal * primitive.dev_normal[vid]); + vo.camPos = glm::vec3(MV * glm::vec4(0, 0, 0, 1)); + if (primitive.dev_texcoord0) vo.texcoord0 = primitive.dev_texcoord0[vid]; + if (primitive.dev_diffuseTex) vo.dev_diffuseTex = primitive.dev_diffuseTex; + vo.texHeight = primitive.diffuseTexHeight; + vo.texWidth = primitive.diffuseTexWidth; } } @@ -660,12 +915,12 @@ void _primitiveAssembly(int numIndices, int curPrimitiveBeginId, Primitive* dev_ // TODO: uncomment the following code for a start // This is primitive assembly for triangles - //int pid; // id for cur primitives vector - //if (primitive.primitiveMode == TINYGLTF_MODE_TRIANGLES) { - // pid = iid / (int)primitive.primitiveType; - // dev_primitives[pid + curPrimitiveBeginId].v[iid % (int)primitive.primitiveType] - // = primitive.dev_verticesOut[primitive.dev_indices[iid]]; - //} + int pid; // id for cur primitives vector + if (primitive.primitiveMode == TINYGLTF_MODE_TRIANGLES) { + pid = iid / (int)primitive.primitiveType; + dev_primitives[pid + curPrimitiveBeginId].v[iid % (int)primitive.primitiveType] + = primitive.dev_verticesOut[primitive.dev_indices[iid]]; + } // TODO: other primitive types (point, line) @@ -679,6 +934,11 @@ void _primitiveAssembly(int numIndices, int curPrimitiveBeginId, Primitive* dev_ * Perform rasterization. */ void rasterize(uchar4 *pbo, const glm::mat4 & MVP, const glm::mat4 & MV, const glm::mat3 MV_normal) { + using time_point_t = std::chrono::high_resolution_clock::time_point; + time_point_t start_time, end_time; + float elapsed_time; + std::chrono::duration dur; + int sideLength2d = 8; dim3 blockSize2d(sideLength2d, sideLength2d); dim3 blockCount2d((width - 1) / blockSize2d.x + 1, @@ -694,6 +954,7 @@ void rasterize(uchar4 *pbo, const glm::mat4 & MVP, const glm::mat4 & MV, const g auto it = mesh2PrimitivesMap.begin(); auto itEnd = mesh2PrimitivesMap.end(); + for (; it != itEnd; ++it) { auto p = (it->second).begin(); // each primitive @@ -701,15 +962,37 @@ void rasterize(uchar4 *pbo, const glm::mat4 & MVP, const glm::mat4 & MV, const g for (; p != pEnd; ++p) { dim3 numBlocksForVertices((p->numVertices + numThreadsPerBlock.x - 1) / numThreadsPerBlock.x); dim3 numBlocksForIndices((p->numIndices + numThreadsPerBlock.x - 1) / numThreadsPerBlock.x); + +#if TIMER + start_time = std::chrono::high_resolution_clock::now(); +#endif _vertexTransformAndAssembly << < numBlocksForVertices, numThreadsPerBlock >> >(p->numVertices, *p, MVP, MV, MV_normal, width, height); checkCUDAError("Vertex Processing"); cudaDeviceSynchronize(); +#if TIMER + cudaDeviceSynchronize(); + end_time = std::chrono::high_resolution_clock::now(); + dur = end_time - start_time; + elapsed_time = static_cast(dur.count()); + std::cout << "vertex processing elapsed time: " << elapsed_time << "ms." << std::endl; +#endif + +#if TIMER + start_time = std::chrono::high_resolution_clock::now(); +#endif _primitiveAssembly << < numBlocksForIndices, numThreadsPerBlock >> > (p->numIndices, curPrimitiveBeginId, dev_primitives, *p); +#if TIMER + cudaDeviceSynchronize(); + end_time = std::chrono::high_resolution_clock::now(); + dur = end_time - start_time; + elapsed_time = static_cast(dur.count()); + std::cout << "primitive assembly elapsed time: " << elapsed_time << "ms." << std::endl; +#endif checkCUDAError("Primitive Assembly"); curPrimitiveBeginId += p->numPrimitives; @@ -723,11 +1006,33 @@ void rasterize(uchar4 *pbo, const glm::mat4 & MVP, const glm::mat4 & MV, const g initDepth << > >(width, height, dev_depth); // TODO: rasterize - - + dim3 numThreadsPerBlock(128); + dim3 numBlocksForPrimitives((totalNumPrimitives + numThreadsPerBlock.x - 1) / numThreadsPerBlock.x); +#if TIMER + start_time = std::chrono::high_resolution_clock::now(); +#endif + kernelRasterize << > > (totalNumPrimitives, dev_primitives, dev_fragmentBuffer, dev_depth, dev_mutex, width, height); +#if TIMER + cudaDeviceSynchronize(); + end_time = std::chrono::high_resolution_clock::now(); + dur = end_time - start_time; + elapsed_time = static_cast(dur.count()); + std::cout << "rasterize elapsed time: " << elapsed_time << "ms." << std::endl; +#endif // Copy depthbuffer colors into framebuffer +#if TIMER + start_time = std::chrono::high_resolution_clock::now(); +#endif render << > >(width, height, dev_fragmentBuffer, dev_framebuffer); +#if TIMER + cudaDeviceSynchronize(); + end_time = std::chrono::high_resolution_clock::now(); + dur = end_time - start_time; + elapsed_time = static_cast(dur.count()); + std::cout << "fragment shader elapsed time: " << elapsed_time << "ms." << std::endl; +#endif + checkCUDAError("fragment shader"); // Copy framebuffer into OpenGL buffer for OpenGL previewing sendImageToPBO<<>>(pbo, width, height, dev_framebuffer); diff --git a/src/rasterize.h b/src/rasterize.h index 560aae9..5cfd87e 100644 --- a/src/rasterize.h +++ b/src/rasterize.h @@ -12,6 +12,7 @@ #include #include + namespace tinygltf{ class Scene; }