Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 77
Posts: 77   Pages: 8   [ 1 2 3 4 5 6 7 8 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 150380 times and has 76 replies Next Thread
DrMason
Senior Cruncher
Joined: Mar 16, 2007
Post Count: 153
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
OpenPandemics GPU is LIVE!

Congrats to the team for implementing the first GPU project on WCG in years. Here's to a massive increase in scientific throughput!
----------------------------------------

[Apr 6, 2021 6:45:50 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Ian-n-Steve C.
Senior Cruncher
United States
Joined: May 15, 2020
Post Count: 180
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics GPU is LIVE!

Will further efforts to optimize the application be pursued?

the app in it's current state is only using a fraction of the GPU power (constantly bouncing between 0-100%). If they can get the app in a state that relies less on shuffling data around and more pure crunching, the tasks will process much faster. projects with the most optimized apps tend to peg the GPU to 95-100% utilization the entire run. but this behavior of flipping between 0-100% is just leaving performance on the table.
----------------------------------------

EPYC 7V12 / [5] RTX A4000
EPYC 7B12 / [5] RTX 3080Ti + [2] RTX 2080Ti
EPYC 7B12 / [6] RTX 3070Ti + [2] RTX 3060
[2] EPYC 7642 / [2] RTX 2080Ti
[Apr 7, 2021 6:22:43 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Bryn Mawr
Senior Cruncher
Joined: Dec 26, 2018
Post Count: 338
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics GPU is LIVE!

Will further efforts to optimize the application be pursued?

the app in it's current state is only using a fraction of the GPU power (constantly bouncing between 0-100%). If they can get the app in a state that relies less on shuffling data around and more pure crunching, the tasks will process much faster. projects with the most optimized apps tend to peg the GPU to 95-100% utilization the entire run. but this behavior of flipping between 0-100% is just leaving performance on the table.


Apparently you can improve the situation by running 2 or 3 tasks concurrently within the GPU.
[Apr 7, 2021 6:59:29 PM]   Link   Report threatening or abusive post: please login first  Go to top 
BladeD
Ace Cruncher
USA
Joined: Nov 17, 2004
Post Count: 28976
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics GPU is LIVE!

Will further efforts to optimize the application be pursued?

the app in it's current state is only using a fraction of the GPU power (constantly bouncing between 0-100%). If they can get the app in a state that relies less on shuffling data around and more pure crunching, the tasks will process much faster. projects with the most optimized apps tend to peg the GPU to 95-100% utilization the entire run. but this behavior of flipping between 0-100% is just leaving performance on the table.

Well you can run more than one WU at a time. whistling
----------------------------------------
[Apr 7, 2021 6:59:38 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Ian-n-Steve C.
Senior Cruncher
United States
Joined: May 15, 2020
Post Count: 180
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics GPU is LIVE!


Well you can run more than one WU at a time. whistling


I have, but it's a waste of CPU resources to tie up 2 CPU threads to boost the efficiency when the app could be better optimized with still a single thread and run faster still.

GPU memory use is very low (~300MB)
GPU memory controller use is very low (0-5%)
PCIe bus use is very low (<1%)

the app just isn't making good use of the available resources. and with how short these tasks are, they could probably preload all the data into GPU memory instead of passing it back and forth from the system memory or maybe even system disk, or better streamline the data transfer process to minimize the times that the GPU is waiting for something to do.
----------------------------------------

EPYC 7V12 / [5] RTX A4000
EPYC 7B12 / [5] RTX 3080Ti + [2] RTX 2080Ti
EPYC 7B12 / [6] RTX 3070Ti + [2] RTX 3060
[2] EPYC 7642 / [2] RTX 2080Ti
----------------------------------------
[Edit 1 times, last edit by Ian-n-Steve C. at Apr 7, 2021 7:15:05 PM]
[Apr 7, 2021 7:10:27 PM]   Link   Report threatening or abusive post: please login first  Go to top 
nanoprobe
Master Cruncher
Classified
Joined: Aug 29, 2008
Post Count: 2998
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics GPU is LIVE!


Well you can run more than one WU at a time. whistling


I have, but it's a waste of CPU resources to tie up 2 CPU threads to boost the efficiency when the app could be better optimized with still a single thread and run faster still.

GPU memory use is very low (~300MB)
GPU memory controller use is very low (0-5%)
PCIe bus use is very low (<1%)

the app just isn't making good use of the available resources. and with how short these tasks are, they could probably preload all the data into GPU memory instead of passing it back and forth from the system memory or maybe even system disk, or better streamline the data transfer process to minimize the times that the GPU is waiting for something to do.

I'm guessing from your statement about CPU cores needed to run the tasks that you're using an Nvidia GPU. The reason you need a full core for each task run is Nvidias fault. Frankly their OpenCl support is dismal at best because they are more concerned about their CUDA app than OpenCl. I can run .25 CPUs per task on AMD cards. The fluctuating GPU load comes from 1 job in the WU finishing and the next one starting. Nature of the beast. Run more than 1 WU concurrently and you won't see that.
----------------------------------------
In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.


[Apr 7, 2021 10:37:51 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Kinwolf
Cruncher
Joined: Feb 22, 2011
Post Count: 11
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics GPU is LIVE!

I found out that, on my linux laptop with a Ryzen 4500U, it's not worth the effort to use the GPU version. After fiddling to get opencl 2.0 finally installed, I saw that it takes 1 hour to compete a GPU task, while it takes 1h20 mins on the CPU alone.

Worse, I have to lower the number of CPU core that can crunch(in addition to the GPU) to only one. If I use 4 like usual, the heat climbs so much that the CPU clock is throttled back very low and it takes even longer to complete any workunits. So, I actually crunch more on the laptop using the CPU only. 4 unit/1h20mins.

On my regular PC, it takes around 4-5 mins to crunch a GPU unit.(AMD 5600XT)
----------------------------------------
[Edit 1 times, last edit by Kinwolf at Apr 7, 2021 11:11:02 PM]
[Apr 7, 2021 11:09:59 PM]   Link   Report threatening or abusive post: please login first  Go to top 
hiimebm
Senior Cruncher
United States
Joined: Oct 19, 2014
Post Count: 305
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics GPU is LIVE!

Takes about 6-7 minutes on an RX 570 (I have mine underclocked to save power; I only game at 75FPS, not 200+ or something)
----------------------------------------

[Apr 7, 2021 11:17:23 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Ian-n-Steve C.
Senior Cruncher
United States
Joined: May 15, 2020
Post Count: 180
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics GPU is LIVE!


Well you can run more than one WU at a time. whistling


I have, but it's a waste of CPU resources to tie up 2 CPU threads to boost the efficiency when the app could be better optimized with still a single thread and run faster still.

GPU memory use is very low (~300MB)
GPU memory controller use is very low (0-5%)
PCIe bus use is very low (<1%)

the app just isn't making good use of the available resources. and with how short these tasks are, they could probably preload all the data into GPU memory instead of passing it back and forth from the system memory or maybe even system disk, or better streamline the data transfer process to minimize the times that the GPU is waiting for something to do.

I'm guessing from your statement about CPU cores needed to run the tasks that you're using an Nvidia GPU. The reason you need a full core for each task run is Nvidias fault. Frankly their OpenCl support is dismal at best because they are more concerned about their CUDA app than OpenCl. I can run .25 CPUs per task on AMD cards. The fluctuating GPU load comes from 1 job in the WU finishing and the next one starting. Nature of the beast. Run more than 1 WU concurrently and you won't see that.


i wouldn't say it's nvidia's fault alone. the application is different here between the amd/nvidia implementations, and many other projects are able to fully use the GPU on Nvidia without having to run multiple instances to cover the inefficiencies of bouncing between 0-100% and wasting resources. Einstein FGRPB1G for example, or the GPUGRID CUDA application which is probably the best nvidia application out there for BOINC. or even F@H nvidia apps. the inefficiencies can definitely be fixed in the app without pointing fingers. it just takes some clever coding.

I understand this is a brand new app, i just hope the project devs aren't against further improvments which can be made.

but honestly i wish more projects would get some talented CUDA developers, because time and time again, CUDA is proven to be faster than OpenCL. just look what happened when F@H switched from OpenCL to CUDA for their nvidia apps, the leaderboards had a pole shift and now they are dominated by nvidia. take a similarly spec'd AMD vs nvidia GPU both with well optimized applications, opencl on AMD, and cuda on nvidia, and nvidia almost always comes out on top thanks to CUDA
----------------------------------------

EPYC 7V12 / [5] RTX A4000
EPYC 7B12 / [5] RTX 3080Ti + [2] RTX 2080Ti
EPYC 7B12 / [6] RTX 3070Ti + [2] RTX 3060
[2] EPYC 7642 / [2] RTX 2080Ti
----------------------------------------
[Edit 1 times, last edit by Ian-n-Steve C. at Apr 7, 2021 11:30:19 PM]
[Apr 7, 2021 11:18:31 PM]   Link   Report threatening or abusive post: please login first  Go to top 
nanoprobe
Master Cruncher
Classified
Joined: Aug 29, 2008
Post Count: 2998
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics GPU is LIVE!

I wouldn't say it's nvidia's fault alone.

We'll just have to agree to disagree. When it comes to OpenCl support for applications on Nvidia hardware you can clearly see the difference and it's because they would be shooting themselves in the foot if they made OpenCl better to compete with CUDA. Another thing, CUDA is proprietary and OpenCl is open source so I wouldn't expect anything else from Nvidia. The researchers who want to do their research here are probably not in a position to pay Nvidia for CUDA support and why should they and I highly doubt Nvidia would give it away for free. They can accomplish all they need with OpenCl despite that it could probably get done faster on CUDA. And last but not least OpenCl will run on both AMD and Nvidia. CUDA is Nvidia only.
----------------------------------------
In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.


----------------------------------------
[Edit 1 times, last edit by nanoprobe at Apr 8, 2021 1:38:26 AM]
[Apr 8, 2021 1:36:11 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 77   Pages: 8   [ 1 2 3 4 5 6 7 8 | Next Page ]
[ Jump to Last Post ]
Post new Thread