Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 9
|
![]() |
Author |
|
andgra
Senior Cruncher Sweden Joined: Mar 15, 2014 Post Count: 184 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Could some knowledgeable people elaborate a bit on this topic please.
----------------------------------------On some rigs these times are equal and on some rigs they differ a lot. What does this say? And are there things to tweak to optimize depending on if the times are close or far? I have been playing around a bit with multiple WUs on a GPU, different CPU allocations etc. But haven't really got my head around it. If someone could share their insight on this topic it would be very interesting.
/andgra
![]() |
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7716 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Boinc runs at the lowest priority, so it gets out of the way if you are doing something else on your machine. The cpu time is the time Boinc is actually running its job while the elapsed time is the actual time from when the job started to when the job is finished. If your machine is busy with other tasks much of the time, the elapsed time will be significantly greater than the cpu time.
----------------------------------------Hope this helps. Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
andgra
Senior Cruncher Sweden Joined: Mar 15, 2014 Post Count: 184 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
For CPU work I totally agree with you.
----------------------------------------But for GPU work I'm interested in the correlation between these times. I know the CPU is "feeding" the GPU and therefore it doesn't do the calculation itself. But how it could differ so much and how I could use this to possibly tweak settings by understanding why. Most of the rigs are dedicated to crunching.
/andgra
![]() |
||
|
Boca Raton Community HS
Senior Cruncher Joined: Aug 27, 2021 Post Count: 153 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() |
Are you referring specifically to GPU project(s) on here (aka: OPNG) or more in general? You are right, both are used but it will depend on the project how much of each is used and when. From what I can see from the OPNG work units which are opencl and not CUDA, there is a LOT of gaps in the GPU processing. This could potentially be from a small calculation happening on the CPU during the gaps in GPU utilization but there is very little bus usage on the GPU during this, so there is not massive data flow between the GPU and CPU for these tasks. Every GPU project will have a bottleneck and determining what that is can greatly improve throughput.
Observations from OPNG work: - The memory bandwidth of the GPU does not have a large impact on the speed and the memory utilization is extremely small (impact of memory bandwidth tested on Nvidia P100 GPUs). - The CPU does not impact this work much because CPU utilization is minimal. We run OPNG work with .5 CPU utilization per work unit and I still think this is entirely overkill. - GPU utilization ("graphics") fluctuates from 100% to almost 0% very rapidly during a OPNG work unit. Because of this, there is a LOT of down time on the GPU. To maximize throughput, multiple OPNG work units can be run simultaneously to try and prevent downtime for the GPU. - How many that can be run will depend on the client, but I would suggest more than 5 at the same time based on what we have seen here. This will use the GPU to a fuller extent. We run 7x and still see fluctuations in GPU utilization during this (but minimized compared to running 1x). This is NOT the "magic number" for how many everyone else should run but it works for our GPUs. When OPNG work arrives, almost all of it starts simultaneously on our systems since they seem to come in "bundles" of a few work units. There are not a whole lots of tweaks that can be done on OPNG work besides changing the concurrent work multiplier using the app_config. Also, because it is opencl, there is not much that can be changed on a nvidia GPU or the drivers that I have ever seen. I am definitely not an expert in any way- these are just our observations. Not sure if this answers your question though. |
||
|
andgra
Senior Cruncher Sweden Joined: Mar 15, 2014 Post Count: 184 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Thanx for your reply. Also to Sgt.Joe earlier!
----------------------------------------Yes, I am referring to current OPNG jobs. It gave me some more insights and also confirmation on my own findings. Some statistical findings: - Ryzen 3700x + GTX980: Same CPU/elapsed time. Need multiple simultaneous jobs to keep the GPU utilized. 0.5 CPU/job setting. Around 0.1h/job elapsed. Windows. - 4790 + GTX760: Same CPU/elapsed time. Only one job at the time. 100% utilized. Slow...0.5-1h/job. Linux. - 3770 + GTX1650: Same CPU/elapsed time. Only one job at the time. Not fully utilized. Around 0.1h/job. Linux. - 3770 + GT1030: Around 25-30% longer elapsed time than CPU time. Only one job at the time. Not fully utilized. Around 0.2-0.4h/job. Windows. - 4770 + some really old Radeon: 10-15 times elapsed vs CPU time. Only one job at the time. Not fully utilized. Around 0.6-1.5h/job. Windows. - 3217u w integrated GPU: 15 times elapsed vs CPU time. Only one job at the time. Not fully utilized. Around 1.5-3h/job. Windows. This will have to do for statistics ![]() I guess the only way is to play around with multiple jobs and CPU share to find the optimum for each rig as they a quite different.
/andgra
![]() |
||
|
Boca Raton Community HS
Senior Cruncher Joined: Aug 27, 2021 Post Count: 153 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() |
Thanx for your reply. Also to Sgt.Joe earlier! Yes, I am referring to current OPNG jobs. It gave me some more insights and also confirmation on my own findings. Some statistical findings: - Ryzen 3700x + GTX980: Same CPU/elapsed time. Need multiple simultaneous jobs to keep the GPU utilized. 0.5 CPU/job setting. Around 0.1h/job elapsed. Windows. - 4790 + GTX760: Same CPU/elapsed time. Only one job at the time. 100% utilized. Slow...0.5-1h/job. Linux. - 3770 + GTX1650: Same CPU/elapsed time. Only one job at the time. Not fully utilized. Around 0.1h/job. Linux. - 3770 + GT1030: Around 25-30% longer elapsed time than CPU time. Only one job at the time. Not fully utilized. Around 0.2-0.4h/job. Windows. - 4770 + some really old Radeon: 10-15 times elapsed vs CPU time. Only one job at the time. Not fully utilized. Around 0.6-1.5h/job. Windows. - 3217u w integrated GPU: 15 times elapsed vs CPU time. Only one job at the time. Not fully utilized. Around 1.5-3h/job. Windows. This will have to do for statistics ![]() I guess the only way is to play around with multiple jobs and CPU share to find the optimum for each rig as they a quite different. Agreed- play around with it. Definitely watch the "wall clock time" versus anything that is reported. |
||
|
Paul Schlaffer
Senior Cruncher USA Joined: Jun 12, 2005 Post Count: 251 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Since you're just referring to OPNG, those tweaks only become relevant when you have a steady flow of incoming WU to keep the GPU's busy. My Workstation has dual workstation class GPU cards, and I haven't seen that in a very, very, very long time. Therefore, the juice isn't worth the squeeze, as you're not increasing the work successfully completed.
----------------------------------------![]() “Where an excess of power prevails, property of no sort is duly respected. No man is safe in his opinions, his person, his faculties, or his possessions.” – James Madison (1792) |
||
|
Boca Raton Community HS
Senior Cruncher Joined: Aug 27, 2021 Post Count: 153 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() |
Since you're just referring to OPNG, those tweaks only become relevant when you have a steady flow of incoming WU to keep the GPU's busy. My Workstation has dual workstation class GPU cards, and I haven't seen that in a very, very, very long time. Therefore, the juice isn't worth the squeeze, as you're not increasing the work successfully completed. No doubt. I keep these settings in the "hopes" that someday, sometime, OPNG work will flow again. Maybe they are unfounded... but I still keep them. We also only get packets of these work units every once in a while. |
||
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 995 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Harking back to the original post, and noting andgra's subsequent responses, it may be worth noting that NVIDIA's [sub-optimal] OpenCl implementation will result in CPU time close to elapsed time, unlike AMD drivers...
Also, for typical OPNG tasks GPU idle time is in two categories, set-up/wrap-up and inter-job; given that set-up (running AutoGrid) typically takes seconds rather than minutes, and inter-job activity and wrapup times are both very short, there's not a lot of spare capacity to pick up unless one has a far better GPU than mine (GTX 1050Ti and GTX 1660 Ti) -- I found [long ago] that running "two at once" on the 1650 gave a small but not very significant improvement in throughput (at the expense of a constantly howling GPU fan!) so I gave up! Cheers - Al. P.S. One of the Einstein projects (BRP7 [Meerkat]) has recently switched from an OpenCL application to a CUDA application for NVIDIA. CPU time is now typically 35 to 40 seconds for either GPU, with elapsed times around 1000 seconds for the 1660 and 2100 seconds for the 1050 -- the CUDA app is only about 5 to 10% faster than its OpenCL predecessor, though! |
||
|
|
![]() |