Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project 1: Gehan Zheng #14

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 42 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,47 @@
![1694311599535](image/README/1694311599535.gif)
**University of Pennsylvania, CIS 565: GPU Programming and Architecture,
Project 1 - Flocking**

* (TODO) YOUR NAME HERE
* (TODO) [LinkedIn](), [personal website](), [twitter](), etc.
* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab)
* Gehan Zheng
* [LinkedIn](https://www.linkedin.com/in/gehan-zheng-05877b24a/), [personal website](https://grahamzen.github.io/).
* Tested on: Windows 10, AMD Ryzen 7 5800H @ 3.2GHz 16GB, GeForce RTX 3060 Laptop 6144MB (Personal Laptop)

### (TODO: Your README)
## Screenshot
* N=5000
![1694316622131](image/README/1694316622131.png)

Include screenshots, analysis, etc. (Remember, this is public, so don't put
anything here that you don't want to share with the world.)
* N=160000
![1694370123710](image/README/1694370123710.png)
## Performance Analysis

* For each implementation, how does changing the number of boids affect performance? Why do you think this is?

![1694378908002](image/README/1694378908002.png)

For both three implementation, the performance is only getting worse when the number of boids is increasing. For the first one, it is because the execution time of CUDA component is too much when N is larger, since it iterate over all particles to search for neighbors. Similarly, for uniform grid and coherent uniform grid, once N exceeds a certain threshold, the execution time of non-CUDA components, such as memory swapping, increases substantially which cannot be offset by the performance improvement of CUDA components, resulting in a decrease in FPS.

* For each implementation, how does changing the block count and block size affect performance? Why do you think this is?

![1694378967440](image/README/1694378967440.png)

When the block size is below a certain value, increasing the block size can lead to an increase in FPS, but when it exceeds this value, increasing the block size will result in a decrease in FPS. There are two possible reasons for this.

Firstly, when the block size is below a certain value, increasing the block size causes the grid size to decrease. As a result, there is a reduction in the overhead required to launch fewer blocks.

Secondly, each SM on the GPU has a limited number of SP and registers. When the block size becomes very large, it may not be possible to run all threads in a block concurrently due to these hardware limitations. Consequently, as the block size increases, FPS may decrease because not all threads can run in parallel effectively.

* For the coherent uniform grid: did you experience any performance improvements with the more coherent uniform grid? Was this the outcome you expected? Why or why not?

There is an obvious improvement in performance when using coherent uniform grid especially when the number of boids is large. This is because the coherent uniform grid can reduce the number of boids that need to be checked for each boid.


* Did changing cell width and checking 27 vs 8 neighboring cells affect performance? Why or why not? Be careful: it is insufficient (and possibly incorrect) to say that 27-cell is slower simply because there are more cells to check!

| N | 8-cell | 27-cell |
| ---------- | ------ | ------- |
| 640000 | 200 | 650 |
| 1280000 | 58 | 274 |

The result shown above using coherent uniform grid, and block size is 128.

It did change the performance. When the number of boids is large, the 27-cell version is faster than the 8-cell version. This is because $\frac{27}{2^3} < \frac{8}{1^3}$, the 27-cell version can reduce the number of boids that need to be checked for each boid.
Binary file added image/README/1694311599535.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added image/README/1694316622131.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added image/README/1694370123710.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added image/README/1694378908002.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added image/README/1694378967440.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading