diff --git a/README.md b/README.md
index cad1abd..c928103 100644
--- a/README.md
+++ b/README.md
@@ -5,16 +5,116 @@ CUDA Rasterizer
 
 **University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 4**
 
-* (TODO) YOUR NAME HERE
-* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab)
+* Yuxin Hu
+* Tested on: Windows 10, i7-6700HQ @ 2.60GHz 8GB, GTX 960M 4096MB (Personal Laptop)
 
-### (TODO: Your README)
+### Yuxin Hu
+## Code Change
+* rasterize.cu. Add a new function parameters in function _vertexTransformAndAssembly: float **scale**. For objects that are too large to be displayed properly on screen, I will pass a scale parameters to resize it in model space.
+* rasterize.cu. Add a kernal function **_rasterizePrimitive** to set value for fragment buffer. It has three modes: triangle, point and line.
+* rasterize.cu. Add three function parameters in render() function. **glm::vec3 lightDir** & **float lightIntensity**: for light direction and light intensity that will be used for Lambert shading models. **PrimitiveType mode**: if it is point or line, do not apply shading model, if it is triangle, apply lambert shading model.
+* rasterize.cu. Add a new function **getZByLerp**, get depth of fragment on a line between two vertice.
+* rasterize.cu. Add a new function **rasterizeLine**. A naive approach to loop through all pixels within line's bounding box, and check if each pixel falls on the line segment.
+* rasterize.cu. Add a new function **bresenhamLine**. This is third party code taken reference from  http://tech-algorithm.com/articles/drawing-line-using-bresenham-algorithm/. It uses the Bresenhan Line Algorithm to shade fragments that form the line between two vertices.
+* rasterize.cu. Add a new function **rasterizeWireFrame**. This will be called as a parent function of bresenhamLine.
+* rasterize.h. Add the performance timer class **PerformanceTimer**, adapted from WindyDarian(https://github.com/WindyDarian).
+* rasterizeTools.h. Add a new function **getAABBForLine**. Get the bounding box of the line segment.
+* rasterizeTools.h. Add a new function **getColorAtCoordinate**. Get the color of the fragment using barycentric interpolation, without perspective correction.
+* rasterizeTools.h. Add a new function **getEyeSpaceZAtCoordinate**. Get the eye space z at coordinate using barycentric interpolation.
+* rasterizeTools.h. Add a new function **getTextureAtCoord**. Get the perspective corrected texture uv coordinate using barycentric interpolation.
 
-*DO NOT* leave the README to the last minute! It is a crucial part of the
-project, and we will not be able to grade you without a good README.
+## How to run different rasterize mode?
+* Render primitives with lambert shading model: change the last parameter of below two kernal function calls in rasterize() to **Triangle**
+_rasterizePrimitive (......, **Triangle**)
+render << <blockCount2d, blockSize2d >> >(......, **Triangle**);
+
+* Render primitives with point: change the last parameter of below two kernal function calls in rasterize() to **Point**
+_rasterizePrimitive (......, **Point**)
+render << <blockCount2d, blockSize2d >> >(......, **Point**);
+
+* Render primitives with Lines: change the last parameter of below two kernal function calls  in rasterize() to **Line**
+_rasterizePrimitive (......, **Line**)
+render << <blockCount2d, blockSize2d >> >(......, **Line**);
+
+* Render primitives with scale factor: change the last parameter of below kernal function call in rasteriza(), e.g. set it as 0.01 to render the two cylinder engine.
+_vertexTransformAndAssembly(......, 0.01)
+
+
+## Basic Rasterizer with Bounding Box and Depth Tested
+![Flower Colored with Normals](/renders/FlowerNormal2.gif)
+<p align="center"><b>Flower Colored with Normals</b></p>
+
+![Cow with Lambert Shadings](/renders/Cow.gif)
+<p align="center"><b>Cow with Lambert Shadings</b></p>
+  
+![Double Cylinder Engine with Lamber Shadings](/renders/Engine.gif)
+<p align="center"><b>Double Cylinder Engine with Lamber Shadings</b></p>
+  
+![Double Cylinder Engine Scaled with 0.01](/renders/Engine001.gif)
+<p align="center"><b>Double Cylinder Engine Scaled with 0.01</b></p>
+  
+![Character Model with Lambert Shadings](/renders/Di.gif)
+<p align="center"><b>Character Model with Lambert Shadings</b></p>
+  
+## Interpolate Fragment Colors Within Triangle
+![Color Interpolation Within Each Triangle](/renders/CubeColorInterpolation.PNG)
+<p align="center"><b>Color Interpolation Within Each Triangle</b></p>
+  
+## UV Texture Map
+![Checker Box with Black and White Grid Texture](/renders/CheckerBoxPerspectiveCorrect.gif)
+<p align="center"><b>Checker Box with Black and White Grid Texture</b></p>
+  
+![Yellow Duck with Texture](/renders/Duck.gif)
+<p align="center"><b>Yellow Duck with Texture</b></p>
+  
+![Cesium Milk Truck with Texture](/renders/CeciumMilkTruck.gif)
+<p align="center"><b>Cesium Milk Truck with Texture</b></p>
+
+## Point
+![Box rendered with points](/renders/PointBox.gif)
+<p align="center"><b>Box rendered with points only</b></p>
+
+![Cow rendered with points](/renders/PointCow.gif)
+<p align="center"><b>Cow rendered with points</b></p>
+
+## Line
+![Cow rendered with Lines](/renders/LineCowNaive.gif)
+<p align="center"><b>Cow rendered with lines</b></p>
+
+
+  
+## Performance Analysis
+* Rasterize Kernal Run Time Versus Depth of Object along Camera Z
+![Rasterize Kernal Run Time Versus Depth of Object](/renders/PerformanceDepth.PNG)
+<p align="center"><b>Rasterize Kernal Run Time Versus Depth of Object</b></p>
+
+In general the closer the objects toward camera, the longer it takes to complete rasterize kernal. Because the closer the objects are towards camera, the larger area each triangle will occupy in the screen space. In the rasterize primitive kernal we need to loop over more pixels. The number of triangles does not affect the performance. More triangles (complex engine scaled at 0.01) does not necessary take more time to complete. From the sudden increase of run time between -2 and -1, it is clear that the bottleneck is the occupancy of the triangles on screen. At a very close distance, a few triangle will be rendering on screen, but each of them almost take entire screen space, and we have to loop over all pixels within the bounding box, which severely affects performance.
+
+* Rasterize Pipeline Breakdown
+![Rasterize Pipeline Runtime Breakdown](/renders/PerformancePipelineBreakdown.PNG)
+<p align="center"><b>Rasterize Pipeline Runtime Breakdown</b></p>
+Except Engine which is scaled by 0.01, the rest of objects runtime are measured when camera is at z=-3. The runtime of both boxes and triangles are long because of the slow rasterize process, although they contain much fewer number of primitives. Again it shows that the bottleneck is the loop over large triangles on screenspace. On the contrary, although engine and duck have more primitives, each primitives only occupy a small region on screen, and the loop time in rasterize is shorter. This shows that parallel kernal threads do improve the performance when number of primitives increase, however, the bottleneck comes when some primitive occupies large screen space, and those thread will take long to finish.
+
+![Rasterize Pipeline Runtime Percentage Breakdown](/renders/PerformancePipelinePercentage.PNG)
+<p align="center"><b>Rasterize Pipeline Runtime Percentage Breakdown</b></p>
+As the primitive number increases, the percentage of runtime taken by vertex transformation, primitive assmebly, and render will increase.
+
+
+
+* Rasterize Kernal Run Time Versus Texture Read
+![Rasterize Kernal Run Time Of Checkerbox](/renders/PerformanceTexture.PNG)
+<p align="center"><b>Rasterize Kernal Run Time Of Checkerbox</b></p>
+
+It takes twice the time to render checkerbox with texture read.
+
+* Rasterize Line Methods Comparason
+![Rasterize Line Methods Comparason](/renders/PerformanceLineRasterize.PNG)
+<p align="center"><b>Rasterize Line Methods Comparason</b></p>
+I used a naive approach to render lines, which is looping through all pixels within the line's bounding box, and check if each pixel falls on the line. I also tested the Bresenham line algorithm, which is the algorithm described in http://tech-algorithm.com/articles/drawing-line-using-bresenham-algorithm/ The idea is that for line in first octanc, where the slop is between 0 and 1, we increment x every time, and we render either (x+1,y) or (x+1, y+1) based on which pixel is closer to the line. For lines in other octant, we simply convert them to the first octant and repeat the method. This method avoids looping through all pixels, where most of them are not falling on line. From the performance analysis we can observe that the Bresenham line algorithm has almost 4 times performance improvement than naive apprach.
 
 
 ### Credits
 
 * [tinygltfloader](https://github.com/syoyo/tinygltfloader) by [@soyoyo](https://github.com/syoyo)
 * [glTF Sample Models](https://github.com/KhronosGroup/glTF/blob/master/sampleModels/README.md)
+* [Bresenham Line Algorithm Code](http://tech-algorithm.com/articles/drawing-line-using-bresenham-algorithm/)
diff --git a/renders/CeciumMilkTruck.gif b/renders/CeciumMilkTruck.gif
new file mode 100644
index 0000000..5fcfe17
Binary files /dev/null and b/renders/CeciumMilkTruck.gif differ
diff --git a/renders/CheckerBox.gif b/renders/CheckerBox.gif
new file mode 100644
index 0000000..19bfae5
Binary files /dev/null and b/renders/CheckerBox.gif differ
diff --git a/renders/CheckerBoxPerspectiveCorrect.gif b/renders/CheckerBoxPerspectiveCorrect.gif
new file mode 100644
index 0000000..560ff01
Binary files /dev/null and b/renders/CheckerBoxPerspectiveCorrect.gif differ
diff --git a/renders/ColorInterpolation.PNG b/renders/ColorInterpolation.PNG
new file mode 100644
index 0000000..7cb1eb7
Binary files /dev/null and b/renders/ColorInterpolation.PNG differ
diff --git a/renders/Cow.gif b/renders/Cow.gif
new file mode 100644
index 0000000..8ee2f9e
Binary files /dev/null and b/renders/Cow.gif differ
diff --git a/renders/CubeColorInterpolation.PNG b/renders/CubeColorInterpolation.PNG
new file mode 100644
index 0000000..f3b708f
Binary files /dev/null and b/renders/CubeColorInterpolation.PNG differ
diff --git a/renders/Di.gif b/renders/Di.gif
new file mode 100644
index 0000000..c08a8b8
Binary files /dev/null and b/renders/Di.gif differ
diff --git a/renders/Duck.gif b/renders/Duck.gif
new file mode 100644
index 0000000..05dcaa1
Binary files /dev/null and b/renders/Duck.gif differ
diff --git a/renders/Engine.gif b/renders/Engine.gif
new file mode 100644
index 0000000..0c088e4
Binary files /dev/null and b/renders/Engine.gif differ
diff --git a/renders/Engine001.gif b/renders/Engine001.gif
new file mode 100644
index 0000000..fbbc556
Binary files /dev/null and b/renders/Engine001.gif differ
diff --git a/renders/FlowerNormal2.gif b/renders/FlowerNormal2.gif
new file mode 100644
index 0000000..f332de5
Binary files /dev/null and b/renders/FlowerNormal2.gif differ
diff --git a/renders/LineCow.gif b/renders/LineCow.gif
new file mode 100644
index 0000000..4350ce7
Binary files /dev/null and b/renders/LineCow.gif differ
diff --git a/renders/LineCowNaive.gif b/renders/LineCowNaive.gif
new file mode 100644
index 0000000..3b0e3b2
Binary files /dev/null and b/renders/LineCowNaive.gif differ
diff --git a/renders/PerformanceDepth.PNG b/renders/PerformanceDepth.PNG
new file mode 100644
index 0000000..82988db
Binary files /dev/null and b/renders/PerformanceDepth.PNG differ
diff --git a/renders/PerformanceLineRasterize.PNG b/renders/PerformanceLineRasterize.PNG
new file mode 100644
index 0000000..d4c23ce
Binary files /dev/null and b/renders/PerformanceLineRasterize.PNG differ
diff --git a/renders/PerformancePipelineBreakdown.PNG b/renders/PerformancePipelineBreakdown.PNG
new file mode 100644
index 0000000..bba831b
Binary files /dev/null and b/renders/PerformancePipelineBreakdown.PNG differ
diff --git a/renders/PerformancePipelinePercentage.PNG b/renders/PerformancePipelinePercentage.PNG
new file mode 100644
index 0000000..565bd73
Binary files /dev/null and b/renders/PerformancePipelinePercentage.PNG differ
diff --git a/renders/PerformanceTexture.PNG b/renders/PerformanceTexture.PNG
new file mode 100644
index 0000000..60afdbd
Binary files /dev/null and b/renders/PerformanceTexture.PNG differ
diff --git a/renders/PointBox.gif b/renders/PointBox.gif
new file mode 100644
index 0000000..d856652
Binary files /dev/null and b/renders/PointBox.gif differ
diff --git a/renders/PointCow.gif b/renders/PointCow.gif
new file mode 100644
index 0000000..ba23f3f
Binary files /dev/null and b/renders/PointCow.gif differ
diff --git a/src/main.cpp b/src/main.cpp
index 7986959..b2eb9b9 100644
--- a/src/main.cpp
+++ b/src/main.cpp
@@ -121,6 +121,7 @@ void runCuda() {
 
     cudaGLMapBufferObject((void **)&dptr, pbo);
 	rasterize(dptr, MVP, MV, MV_normal);
+	cout<< z_trans<<" "<<timer().getGpuElapsedTimeForPreviousOperation()<<endl;
     cudaGLUnmapBufferObject(pbo);
 
     frame++;
diff --git a/src/rasterize.cu b/src/rasterize.cu
index 1262a09..a17a10b 100644
--- a/src/rasterize.cu
+++ b/src/rasterize.cu
@@ -17,6 +17,7 @@
 #include "rasterize.h"
 #include <glm/gtc/quaternion.hpp>
 #include <glm/gtc/matrix_transform.hpp>
+#include <chrono>
 
 namespace {
 
@@ -43,10 +44,10 @@ namespace {
 
 		 glm::vec3 eyePos;	// eye space position used for shading
 		 glm::vec3 eyeNor;	// eye space normal used for shading, cuz normal will go wrong after perspective transformation
-		// glm::vec3 col;
+		 glm::vec3 col; //color of the vertex
 		 glm::vec2 texcoord0;
 		 TextureData* dev_diffuseTex = NULL;
-		// int texWidth, texHeight;
+		 int texWidth, texHeight;
 		// ...
 	};
 
@@ -62,8 +63,8 @@ namespace {
 		// The attributes listed below might be useful, 
 		// but always feel free to modify on your own
 
-		// glm::vec3 eyePos;	// eye space position used for shading
-		// glm::vec3 eyeNor;
+		glm::vec3 eyePos;	// eye space position used for shading
+		glm::vec3 eyeNor;
 		// VertexAttributeTexcoord texcoord0;
 		// TextureData* dev_diffuseTex;
 		// ...
@@ -98,6 +99,14 @@ namespace {
 
 }
 
+//***************Performance Analysis Timer******************//
+PerformanceTimer& timer()
+{
+	static PerformanceTimer timer;
+	return timer;
+}
+//***************Performance Analysis Timer******************//
+
 static std::map<std::string, std::vector<PrimitiveDevBufPointers>> mesh2PrimitivesMap;
 
 
@@ -111,6 +120,10 @@ static glm::vec3 *dev_framebuffer = NULL;
 
 static int * dev_depth = NULL;	// you might need this buffer when doing depth test
 
+static glm::vec3 sceneLightDir = glm::normalize(glm::vec3(0, -1, -1));
+static float lightIntensity = 2.0f;
+
+
 /**
  * Kernel that writes the image to the OpenGL PBO directly.
  */
@@ -137,16 +150,27 @@ void sendImageToPBO(uchar4 *pbo, int w, int h, glm::vec3 *image) {
 * Writes fragment colors to the framebuffer
 */
 __global__
-void render(int w, int h, Fragment *fragmentBuffer, glm::vec3 *framebuffer) {
+void render(int w, int h, Fragment *fragmentBuffer, glm::vec3 *framebuffer, glm::vec3 lightDir, float lightIntensity, PrimitiveType mode) {
     int x = (blockIdx.x * blockDim.x) + threadIdx.x;
     int y = (blockIdx.y * blockDim.y) + threadIdx.y;
     int index = x + (y * w);
 
     if (x < w && y < h) {
-        framebuffer[index] = fragmentBuffer[index].color;
-
-		// TODO: add your fragment shader code here
-
+		if (mode == Line || mode == Point) {
+			framebuffer[index] = fragmentBuffer[index].color;
+		}
+		else {
+			//Lambert
+			float lambertTerm = glm::dot(fragmentBuffer[index].eyeNor, lightDir);
+			if (lambertTerm <= 0) {
+				lambertTerm = 0.2;
+			}
+			else if (lambertTerm > 1) {
+				lambertTerm = 1;
+			}
+			framebuffer[index] = glm::clamp(lambertTerm*lightIntensity*fragmentBuffer[index].color, glm::vec3(0), glm::vec3(1));
+			// TODO: add your fragment shader code here
+		}
     }
 }
 
@@ -482,7 +506,7 @@ void rasterizeSetBuffers(const tinygltf::Scene & scene) {
 							componentTypeByteSize = sizeof(VertexAttributePosition) / n;
 							dev_attribute = (BufferByte**)&dev_position;
 						}
-						else if (it->first.compare("NORMAL") == 0) {
+						else if (it->first.compare("NORMAL") == 0) {	
 							componentTypeByteSize = sizeof(VertexAttributeNormal) / n;
 							dev_attribute = (BufferByte**)&dev_normal;
 						}
@@ -628,7 +652,7 @@ void _vertexTransformAndAssembly(
 	int numVertices, 
 	PrimitiveDevBufPointers primitive, 
 	glm::mat4 MVP, glm::mat4 MV, glm::mat3 MV_normal, 
-	int width, int height) {
+	int width, int height, float scale) {
 
 	// vertex id
 	int vid = (blockIdx.x * blockDim.x) + threadIdx.x;
@@ -638,10 +662,46 @@ void _vertexTransformAndAssembly(
 		// Multiply the MVP matrix for each vertex position, this will transform everything into clipping space
 		// Then divide the pos by its w element to transform into NDC space
 		// Finally transform x and y to viewport space
+		glm::vec4 modelSpacePos = glm::vec4(primitive.dev_position[vid].x*scale,
+											primitive.dev_position[vid].y*scale,
+											primitive.dev_position[vid].z*scale, 1);
+		glm::vec3 eyeSpacePos = multiplyMV(MV, modelSpacePos);
+		glm::vec4 projectionPos = MVP * modelSpacePos;
+		glm::vec3 ndcPos = glm::vec3(projectionPos.x / projectionPos.w,
+										projectionPos.y / projectionPos.w,
+										projectionPos.z / projectionPos.w);
+
+		glm::vec2 screenPos = glm::vec2((ndcPos.x + 1)* (width*1.0f / 2), (1 - ndcPos.y)* (height*1.0f / 2));
+		glm::vec3 modelSpaceNor = glm::vec3(primitive.dev_normal[vid]);
+		glm::vec3 eyeSpaceNormal = glm::normalize(MV_normal * modelSpaceNor);
 
 		// TODO: Apply vertex assembly here
 		// Assemble all attribute arraies into the primitive array
+		primitive.dev_verticesOut[vid].pos = glm::vec4(screenPos.x, screenPos.y, eyeSpacePos.z,1);
+		primitive.dev_verticesOut[vid].eyePos = eyeSpacePos;
+		primitive.dev_verticesOut[vid].eyeNor = eyeSpaceNormal;
 		
+		//Read color from texture file
+		if (primitive.dev_diffuseTex != NULL) {
+			primitive.dev_verticesOut[vid].dev_diffuseTex = primitive.dev_diffuseTex;
+			primitive.dev_verticesOut[vid].texcoord0 = primitive.dev_texcoord0[vid];
+			primitive.dev_verticesOut[vid].texWidth = primitive.diffuseTexWidth;
+			primitive.dev_verticesOut[vid].texHeight = primitive.diffuseTexHeight;
+			primitive.dev_verticesOut[vid].col = glm::vec3(0.6);
+		}
+		else {
+			primitive.dev_verticesOut[vid].col = glm::vec3(0.6);
+			//Test for color interpolation
+			/*if (vid % 3 == 0) {
+				primitive.dev_verticesOut[vid].col = glm::vec3(vid*1.0 / numVertices, 0, 0);
+			}
+			if (vid % 3 == 1) {
+				primitive.dev_verticesOut[vid].col = glm::vec3(0, vid*1.0 / numVertices, 0);
+			}
+			if (vid % 3 == 2) {
+				primitive.dev_verticesOut[vid].col = glm::vec3(0, 0, vid*1.0 / numVertices);
+			}*/
+		}	
 	}
 }
 
@@ -660,12 +720,12 @@ void _primitiveAssembly(int numIndices, int curPrimitiveBeginId, Primitive* dev_
 		// TODO: uncomment the following code for a start
 		// This is primitive assembly for triangles
 
-		//int pid;	// id for cur primitives vector
-		//if (primitive.primitiveMode == TINYGLTF_MODE_TRIANGLES) {
-		//	pid = iid / (int)primitive.primitiveType;
-		//	dev_primitives[pid + curPrimitiveBeginId].v[iid % (int)primitive.primitiveType]
-		//		= primitive.dev_verticesOut[primitive.dev_indices[iid]];
-		//}
+		int pid;	// id for cur primitives vector
+		if (primitive.primitiveMode == TINYGLTF_MODE_TRIANGLES) {
+			pid = iid / (int)primitive.primitiveType;
+			dev_primitives[pid + curPrimitiveBeginId].v[iid % (int)primitive.primitiveType]
+				= primitive.dev_verticesOut[primitive.dev_indices[iid]];
+		}
 
 
 		// TODO: other primitive types (point, line)
@@ -673,7 +733,271 @@ void _primitiveAssembly(int numIndices, int curPrimitiveBeginId, Primitive* dev_
 	
 }
 
+__device__
+float getZbyLerp(glm::vec2 newPos, glm::vec3 p1, glm::vec3 p2) {
+	float fraction = (newPos.x - p1.x) / (p2.x - p1.x);
+	return (1 - fraction)*p1.z + fraction*p2.z;
+}
+
+__device__
+void rasterizeLine(VertexOut point1, VertexOut point2, Fragment* dev_fragmentBuffer, int* dev_depth, int height, int width) {
+	glm::vec3 line[2] = { glm::vec3(point1.pos) , glm::vec3(point2.pos) };
+	AABB boundBox = getAABBForLine(line);
+	if (boundBox.min.x > width - 1 || boundBox.min.y > height - 1 || boundBox.max.x < 0 || boundBox.max.y < 0) {
+		return;
+	}
+	else {
+		boundBox.min.x = boundBox.min.x >= 0 ? boundBox.min.x : 0;
+		boundBox.max.x = boundBox.max.x < width ? boundBox.max.x : width-1;
+		boundBox.min.y = boundBox.min.y >= 0 ? boundBox.min.y : 0;
+		boundBox.max.y = boundBox.max.y < height ? boundBox.max.y : height-1;
+
+		for (int x = boundBox.min.x; x <= boundBox.max.x; x++) {
+			for (int y = boundBox.min.y; y <= boundBox.max.y; y ++) {
+				if (fabs(glm::dot(glm::normalize(glm::vec3(x - point1.pos.x, y - point1.pos.y, 0)), 
+					glm::normalize(glm::vec3(point2.pos.x - x, point2.pos.y - y, 0)))- 1) <0.005) {
+					float fragmentDepth = getZbyLerp(glm::vec2(x, y), glm::vec3(point1.pos), glm::vec3(point2.pos));
+					int mappedIntDepth = fragmentDepth * 100;
+					int fragmentIndex = y*width + x;
+					int oldDepth = atomicMin(&dev_depth[fragmentIndex], mappedIntDepth);
+					if (oldDepth > mappedIntDepth) {
+						dev_fragmentBuffer[fragmentIndex].color = glm::vec3(0.6);
+					}
+				}
+			}
+		}
+	}
+}
+
+__device__
+int minimum(int a, int b) {
+	return a < b ? a : b;
+}
+
+__device__
+int maximum(int a, int b) {
+	return a > b ? a : b;
+}
 
+__device__
+void bresenhamLine(int x0, int y0, int x1, int y1, Fragment* dev_fragmentBuffer, int width, int height) {
+	//Reference: http://tech-algorithm.com/articles/drawing-line-using-bresenham-algorithm/
+	int w = x1 - x0;
+	int h = y1 - y0;
+	int dx1 = 0, dy1 = 0, dx2 = 0, dy2 = 0;
+	if (w<0) dx1 = -1; else if (w>0) dx1 = 1;
+	if (h<0) dy1 = -1; else if (h>0) dy1 = 1;
+	if (w<0) dx2 = -1; else if (w>0) dx2 = 1;
+	int longest = abs(w);
+	int shortest = abs(h);
+	if (!(longest>shortest)) {
+		longest = abs(h);
+		shortest = abs(w);
+		if (h<0) dy2 = -1; else if (h>0) dy2 = 1;
+		dx2 = 0;
+	}
+	int numerator = longest >> 1;
+	for (int i = 0; i <= longest; i++) {
+		int fragmentIndex = y0*width + x0;
+		dev_fragmentBuffer[fragmentIndex].color = glm::vec3(0.6);
+		numerator += shortest;
+		if (!(numerator<longest)) {
+			numerator -= longest;
+			x0 += dx1;
+			y0 += dy1;
+		}
+		else {
+			x0 += dx2;
+			y0 += dy2;
+		}
+	}
+}
+
+__device__
+void rasterizeWireFrame(VertexOut point1, VertexOut point2, Fragment* dev_fragmentBuffer, int* dev_depth, int height, int width) {
+	int x0, y0, x1, y1;
+	//float z0, z1;
+	glm::vec3 color0, color1;
+
+	if (point1.pos.x < point2.pos.x) {
+		x0 = maximum(point1.pos.x, 0);
+		y0 = point1.pos.y;
+		if (y0 < 0) {
+			y0 = 0;
+		}
+		if (y0 > height - 1) {
+			y0 = height - 1;
+		}
+		//z0 = point1.pos.z;
+		color0 = point1.col;
+
+		x1 = minimum(point2.pos.x, width - 1);
+		y1 = point2.pos.y;
+		if (y1 < 0) {
+			y1 = 0;
+		}
+		if (y1 > height - 1) {
+			y1 = height - 1;
+		}
+		//z1 = point2.pos.z;
+		color1 = point2.col;
+
+	}
+	else {
+		x0 = maximum(point2.pos.x, 0);
+		y0 = point2.pos.y;
+		if (y0 < 0) {
+			y0 = 0;
+		}
+		if (y0 > height - 1) {
+			y0 = height - 1;
+		}
+		//z0 = point2.pos.z;
+		color0 = point2.col;
+		x1 = minimum(point1.pos.x, width - 1);
+		y1 = point1.pos.y;
+		if (y1 < 0) {
+			y1 = 0;
+		}
+		if (y1 > height - 1) {
+			y1 = height - 1;
+		}
+		//z1 = point1.pos.z;
+		color1 = point1.col;
+	}
+
+	//horizontal Line
+	if (y0 == y1) {
+		for (int x = x0; x <= x1; x++) {
+			int fragmentIndex = y0*width + x;
+			dev_fragmentBuffer[fragmentIndex].color = glm::vec3(0.6);
+		}
+	}
+	else if (x0 == x1) {
+		//verticle Line
+		for (int y = y0; y <= y1; y++) {
+			int fragmentIndex = y*width + x0;
+			dev_fragmentBuffer[fragmentIndex].color = glm::vec3(0.6);
+		}
+	}
+	else {
+		bresenhamLine(x0, y0, x1, y1, dev_fragmentBuffer, width, height);
+	}
+}
+
+__global__
+void _rasterizePrimitive(int numOfPrimitives, Primitive* dev_primitives, Fragment* dev_fragmentBuffer,
+							int* dev_depth, int height, int width, PrimitiveType mode) {
+	int pid = (blockIdx.x * blockDim.x) + threadIdx.x;
+	if (pid < numOfPrimitives) {
+		if (mode == Triangle) {
+			//Take out vertices of the triangle
+			glm::vec3 p1 = glm::vec3(dev_primitives[pid].v[0].pos);
+			glm::vec3 p2 = glm::vec3(dev_primitives[pid].v[1].pos);
+			glm::vec3 p3 = glm::vec3(dev_primitives[pid].v[2].pos);
+				
+			//********************Rasterize Triangle**********************//
+			//Get bounding box for the triangle
+			glm::vec3 triangle[3] = { p1,p2,p3};
+			AABB bound = getAABBForTriangle(triangle);
+			bool outofscreen = false;
+			if (bound.max.y < 0 || bound.min.y>height - 1 || bound.max.x<0 || bound.min.x>width - 1) {
+				//primitive out of screen no need to rasterize;
+				outofscreen = true;
+			}
+			if (!outofscreen) {
+				int rowMin = bound.min.y >= 0 ? bound.min.y : 0;
+				int rowMax = bound.max.y < height ? bound.max.y : height - 1;
+				int colMin = bound.min.x >= 0 ? bound.min.x : 0;
+				int colMax = bound.max.x < width ? bound.max.x : width - 1;
+
+				for (int row = rowMin; row <= rowMax; row++) {
+					for (int col = colMin; col <= colMax; col++) {
+						int fragmentIndex = row*width + col;
+						glm::vec2 fragmentCoord = glm::vec2(col, row);
+						glm::vec3 baryCentricFragment = calculateBarycentricCoordinate(triangle, fragmentCoord);
+						if (isBarycentricCoordInBounds(baryCentricFragment)) {
+							//Apply Texture if available						
+							//Check with depth buffer
+							float fragmentDepth = getZAtCoordinate(baryCentricFragment, triangle);
+							int mappedIntDepth = fragmentDepth * 100;
+							//change to atomic compare
+							int oldDepth = atomicMin(&dev_depth[fragmentIndex], mappedIntDepth);
+							if (oldDepth > mappedIntDepth) {
+								if (dev_primitives[pid].v[0].dev_diffuseTex != NULL) {
+									glm::vec2 texture[3] = { dev_primitives[pid].v[0].texcoord0,
+										dev_primitives[pid].v[1].texcoord0,
+										dev_primitives[pid].v[2].texcoord0 };
+									float depths[3] = { p1.z,p2.z,p3.z};
+									float fragmentDepthEyeSpace = getEyeSpaceZAtCoordinate(baryCentricFragment, triangle);
+									glm::vec2 fragmentTextureCoord = getTextureAtCoord(baryCentricFragment, texture, depths, fragmentDepthEyeSpace);
+									int imageWidth = dev_primitives[pid].v[0].texWidth;
+									int imageHeight = dev_primitives[pid].v[0].texHeight;
+									int textureIndex = ((int)(fragmentTextureCoord.y*imageHeight))*imageWidth + (int)(fragmentTextureCoord.x*imageWidth);
+									float r = dev_primitives[pid].v[0].dev_diffuseTex[textureIndex * 3];
+									float g = dev_primitives[pid].v[0].dev_diffuseTex[textureIndex * 3 + 1];
+									float b = dev_primitives[pid].v[0].dev_diffuseTex[textureIndex * 3 + 2];
+									dev_fragmentBuffer[fragmentIndex].color = glm::vec3(r / 255, g / 255, b / 255);
+								}
+								else {
+									glm::vec3 color[3] = { dev_primitives[pid].v[0].col,
+										dev_primitives[pid].v[1].col,
+										dev_primitives[pid].v[2].col };
+									dev_fragmentBuffer[fragmentIndex].color = getColorAtCoordinate(baryCentricFragment, color);
+									//Test Normal
+									//dev_fragmentBuffer[fragmentIndex].color = dev_primitives[pid].v[0].eyeNor;
+								}
+								dev_fragmentBuffer[fragmentIndex].eyeNor = dev_primitives[pid].v[0].eyeNor;
+							}
+
+							/*******************No Depth Test*****************/
+							//dev_fragmentBuffer[fragmentIndex].color = dev_primitives[pid].v[0].eyeNor;
+							//dev_fragmentBuffer[fragmentIndex].color = glm::vec3(1.0f);
+							/*******************No Depth Test*****************/
+						}
+					}
+				}
+			}			
+			//********************Rasterize Triangle**********************//
+		}
+		if (mode == Point) {
+			//*******************Rasterize Point***********************//
+			for (int index = 0; index < 3; index++) {
+				glm::vec3 p = glm::vec3(dev_primitives[pid].v[index].pos);				
+				int startRow = floor(p.y) - 1 > 0 ? floor(p.y) - 1 : 0;
+				int endRow = floor(p.y) + 1 < height ? floor(p.y) + 1 : height-1;
+				int startCol = floor(p.x) - 1 > 0 ? floor(p.x) - 1 : 0;
+				int endCol = floor(p.x) + 1 < width ? floor(p.x) + 1 : width-1;
+				float fragmentDepth = p.z;
+				int mappedIntDepth = fragmentDepth * 100;
+				//Color the surrounding fragments
+				for (int x = startCol; x <= endCol; x++) {
+					for (int y = startRow; y <= endRow; y++) {
+						int fragmentIndex = y*width + x;
+						int oldDepth = atomicMin(&dev_depth[fragmentIndex], mappedIntDepth);
+						if (oldDepth > mappedIntDepth) {
+							dev_fragmentBuffer[fragmentIndex].color = dev_primitives[pid].v[index].col;
+						}
+					}
+				}
+			}
+			//*******************Rasterize Point***********************//
+		}
+		if (mode == Line) {
+			//*******************Rasterize Line***********************//
+			//**** rasterizeWireFrame uses Bresenham algorithm, third party code****//
+			rasterizeWireFrame(dev_primitives[pid].v[0], dev_primitives[pid].v[1], dev_fragmentBuffer, dev_depth, height, width);
+			rasterizeWireFrame(dev_primitives[pid].v[0], dev_primitives[pid].v[2], dev_fragmentBuffer, dev_depth,height, width);
+			rasterizeWireFrame(dev_primitives[pid].v[1], dev_primitives[pid].v[2], dev_fragmentBuffer, dev_depth, height, width);
+
+			//**** rasterizeLine uses naive approach, looping through all pixels within line bounding box****//
+			//rasterizeLine(dev_primitives[pid].v[0], dev_primitives[pid].v[1], dev_fragmentBuffer, dev_depth, height, width);
+			//rasterizeLine(dev_primitives[pid].v[0], dev_primitives[pid].v[2], dev_fragmentBuffer, dev_depth, height, width);
+			//rasterizeLine(dev_primitives[pid].v[1], dev_primitives[pid].v[2], dev_fragmentBuffer, dev_depth, height, width);
+			//*******************Rasterize Line***********************//
+		}
+	}
+}
 
 /**
  * Perform rasterization.
@@ -695,6 +1019,11 @@ void rasterize(uchar4 *pbo, const glm::mat4 & MVP, const glm::mat4 & MV, const g
 		auto it = mesh2PrimitivesMap.begin();
 		auto itEnd = mesh2PrimitivesMap.end();
 
+		//Every scene contains multiple meshes.
+		//Every mesh consists of multiple primitived
+		//it is looping through meshes in the scene
+		//p is looping through primitives of each mesh
+		
 		for (; it != itEnd; ++it) {
 			auto p = (it->second).begin();	// each primitive
 			auto pEnd = (it->second).end();
@@ -702,7 +1031,7 @@ void rasterize(uchar4 *pbo, const glm::mat4 & MVP, const glm::mat4 & MV, const g
 				dim3 numBlocksForVertices((p->numVertices + numThreadsPerBlock.x - 1) / numThreadsPerBlock.x);
 				dim3 numBlocksForIndices((p->numIndices + numThreadsPerBlock.x - 1) / numThreadsPerBlock.x);
 
-				_vertexTransformAndAssembly << < numBlocksForVertices, numThreadsPerBlock >> >(p->numVertices, *p, MVP, MV, MV_normal, width, height);
+				_vertexTransformAndAssembly << < numBlocksForVertices, numThreadsPerBlock >> >(p->numVertices, *p, MVP, MV, MV_normal, width, height, 1);
 				checkCUDAError("Vertex Processing");
 				cudaDeviceSynchronize();
 				_primitiveAssembly << < numBlocksForIndices, numThreadsPerBlock >> >
@@ -719,15 +1048,24 @@ void rasterize(uchar4 *pbo, const glm::mat4 & MVP, const glm::mat4 & MV, const g
 		checkCUDAError("Vertex Processing and Primitive Assembly");
 	}
 	
+	
+
 	cudaMemset(dev_fragmentBuffer, 0, width * height * sizeof(Fragment));
 	initDepth << <blockCount2d, blockSize2d >> >(width, height, dev_depth);
 	
 	// TODO: rasterize
+	//dev_primitives
+	dim3 numBlocksForPrimitives((totalNumPrimitives + 128- 1) / 128);
 
-
+	timer().startGpuTimer();
+	_rasterizePrimitive << <numBlocksForPrimitives, 128 >> > (curPrimitiveBeginId, dev_primitives, dev_fragmentBuffer, 
+		dev_depth, height, width, Triangle);
+	timer().endGpuTimer();
 
     // Copy depthbuffer colors into framebuffer
-	render << <blockCount2d, blockSize2d >> >(width, height, dev_fragmentBuffer, dev_framebuffer);
+	
+	render << <blockCount2d, blockSize2d >> >(width, height, dev_fragmentBuffer, dev_framebuffer, -sceneLightDir, lightIntensity, Triangle);
+	
 	checkCUDAError("fragment shader");
     // Copy framebuffer into OpenGL buffer for OpenGL previewing
     sendImageToPBO<<<blockCount2d, blockSize2d>>>(pbo, width, height, dev_framebuffer);
diff --git a/src/rasterize.h b/src/rasterize.h
index 560aae9..48c96bd 100644
--- a/src/rasterize.h
+++ b/src/rasterize.h
@@ -8,17 +8,109 @@
 
 #pragma once
 
+#include <chrono>
 #include <glm/glm.hpp>
 #include <glm/gtc/matrix_transform.hpp>
 #include <glm/gtx/transform.hpp>
 
+class PerformanceTimer
+{
+	/**
+	* This class is used for timing the performance
+	* Uncopyable and unmovable
+	*
+	* Adapted from WindyDarian(https://github.com/WindyDarian)
+	*/
+public:
+	PerformanceTimer()
+	{
+		cudaEventCreate(&event_start);
+		cudaEventCreate(&event_end);
+	}
+
+	~PerformanceTimer()
+	{
+		cudaEventDestroy(event_start);
+		cudaEventDestroy(event_end);
+	}
+
+	void startCpuTimer()
+	{
+		if (cpu_timer_started) { throw std::runtime_error("CPU timer already started"); }
+		cpu_timer_started = true;
+
+		time_start_cpu = std::chrono::high_resolution_clock::now();
+	}
+
+	void endCpuTimer()
+	{
+		time_end_cpu = std::chrono::high_resolution_clock::now();
+
+		if (!cpu_timer_started) { throw std::runtime_error("CPU timer not started"); }
+
+		std::chrono::duration<double, std::milli> duro = time_end_cpu - time_start_cpu;
+		prev_elapsed_time_cpu_milliseconds =
+			static_cast<decltype(prev_elapsed_time_cpu_milliseconds)>(duro.count());
+
+		cpu_timer_started = false;
+	}
+
+	void startGpuTimer()
+	{
+		if (gpu_timer_started) { throw std::runtime_error("GPU timer already started"); }
+		gpu_timer_started = true;
+
+		cudaEventRecord(event_start);
+	}
+
+	void endGpuTimer()
+	{
+		cudaEventRecord(event_end);
+		cudaEventSynchronize(event_end);
+
+		if (!gpu_timer_started) { throw std::runtime_error("GPU timer not started"); }
+
+		cudaEventElapsedTime(&prev_elapsed_time_gpu_milliseconds, event_start, event_end);
+		gpu_timer_started = false;
+	}
+
+	float getCpuElapsedTimeForPreviousOperation() //noexcept //(damn I need VS 2015
+	{
+		return prev_elapsed_time_cpu_milliseconds;
+	}
+
+	float getGpuElapsedTimeForPreviousOperation() //noexcept
+	{
+		return prev_elapsed_time_gpu_milliseconds;
+	}
+
+	// remove copy and move functions
+	PerformanceTimer(const PerformanceTimer&) = delete;
+	PerformanceTimer(PerformanceTimer&&) = delete;
+	PerformanceTimer& operator=(const PerformanceTimer&) = delete;
+	PerformanceTimer& operator=(PerformanceTimer&&) = delete;
+
+private:
+	cudaEvent_t event_start = nullptr;
+	cudaEvent_t event_end = nullptr;
+
+	using time_point_t = std::chrono::high_resolution_clock::time_point;
+	time_point_t time_start_cpu;
+	time_point_t time_end_cpu;
+
+	bool cpu_timer_started = false;
+	bool gpu_timer_started = false;
+
+	float prev_elapsed_time_cpu_milliseconds = 0.f;
+	float prev_elapsed_time_gpu_milliseconds = 0.f;
+};
+
 namespace tinygltf{
 	class Scene;
 }
 
-
+PerformanceTimer& timer();
 void rasterizeInit(int width, int height);
 void rasterizeSetBuffers(const tinygltf::Scene & scene);
-
 void rasterize(uchar4 *pbo, const glm::mat4 & MVP, const glm::mat4 & MV, const glm::mat3 MV_normal);
 void rasterizeFree();
diff --git a/src/rasterizeTools.h b/src/rasterizeTools.h
index 46c701e..542a3d5 100644
--- a/src/rasterizeTools.h
+++ b/src/rasterizeTools.h
@@ -43,6 +43,20 @@ AABB getAABBForTriangle(const glm::vec3 tri[3]) {
     return aabb;
 }
 
+__host__ __device__ static
+AABB getAABBForLine(const glm::vec3 line[2]) {
+	AABB aabb;
+	aabb.min = glm::vec3(
+		min(line[0].x, line[1].x),
+		min(line[0].y, line[1].y),
+		min(line[0].z, line[1].z));
+	aabb.max = glm::vec3(
+		max(line[0].x, line[1].x),
+		max(line[0].y, line[1].y),
+		max(line[0].z, line[1].z));
+	return aabb;
+}
+
 // CHECKITOUT
 /**
  * Calculate the signed area of a given triangle.
@@ -99,3 +113,24 @@ float getZAtCoordinate(const glm::vec3 barycentricCoord, const glm::vec3 tri[3])
            + barycentricCoord.y * tri[1].z
            + barycentricCoord.z * tri[2].z);
 }
+
+__host__ __device__ static
+glm::vec3 getColorAtCoordinate(const glm::vec3 barycentricCoord, const glm::vec3 color[3]) {
+	return (barycentricCoord.x * color[0]
+		+ barycentricCoord.y * color[1]
+		+ barycentricCoord.z * color[2]);
+}
+
+__host__ __device__ static
+float getEyeSpaceZAtCoordinate(const glm::vec3 barycentricCoord, const glm::vec3 tri[3]) {
+	return 1.0f/(barycentricCoord.x /tri[0].z
+		+ barycentricCoord.y /tri[1].z
+		+ barycentricCoord.z /tri[2].z);
+}
+
+__host__ __device__ static
+glm::vec2 getTextureAtCoord(const glm::vec3 barycentricCoord, const glm::vec2 texture[3], const float depths[3], const float zdepth) {
+	return (barycentricCoord.x * texture[0] * zdepth*1.0f / depths[0]
+		+ barycentricCoord.y * texture[1] * zdepth*1.0f / depths[1]
+		+ barycentricCoord.z * texture[2] * (zdepth*1.0f / depths[2]));
+}