Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 9
|
![]() |
Author |
|
ca05065
Senior Cruncher Joined: Dec 4, 2007 Post Count: 325 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I have only recently returned to running mapping cancer markers.
When available I run 2 OPNG on a Geforce GT1030 GPU and allocate 1 CPU thread to support them on a 12 thread, 16Gb Ryzen 5 3600 CPU. I have noticed that when running 2 OPNG, 10 OPN1 and 1 MCM1, the MCM1 only gets about half the CPU cycles of the other WCG work units - 3.5-4.0% compared to 7.7-8.1%. The MCM1 task uses less memory than the other tasks and is showing no disk I/O. Has anyone else noted this? Can anyone offer an explanation? |
||
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 902 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
You don 't specify your operating system, so my observations may not be 100% useful as they're all Linux-based...
Firstly, note that if you are running 10 OPN1 tasks on that machine at the same time, they will run somewhat slower than if you only ran (say) 4 or 5 at a time; OPN1 doesn't hammer the L3 cache as much as MIP1 did, but running lots of them does seem to slow down everything else to a degree. Also, with that many tasks, OPN1 is probably running more or less as slowly as it's likely to, so any performance hit due to lost CPU cycles is less likely to be obvious - the cycles are already lost waiting for memory! Secondly (and this is the bit I can't be sure is the same on Windows), the NVIDIA drivers on Linux tend to run in a mode that needs an entire CPU per OPNG task all the time the GPU is associated with the program; the driver spends quite a lot of its time spinning its wheels waiting for things to do! Another thing that may or may not be the same on Windows as it is on Linux -- BOINC runs GPU jobs at a higher priority than CPU jobs, so those CPU-hogging NVIDIA OPNG jobs will be more likely to get processor time than MCM1 and OPN1 (although, as mentioned earlier, that may be more likely to show up against MCM1 than OPN1... And are you really trying to run two OPNG tasks on a GT1030 at the same time? I found that (on Linux) running one at a time was optimal for a 1050 Ti and two for a 1660 Ti - my i7-7700K slowed down by an unacceptable amount if I tried to run two at once on the 1050Ti with anything else running! The main reason for running more than one at once is to fill in the gaps between periods of GPU usage, and on slower GPUs the proportion of run time wasted isn't really high enough to justify the extra system load! By the way, I've found that if one doesn't leave at least one full core free for the O/S to play in, all BOINC applications run slower; in fact, on my i7-7700K I only allow BOINC access to 5 out of 8 threads, and on my Ryzen 3700X I allow it access to 11 out of 16 threads. In fairness, I should point out that I've got some quite busy system monitoring tools running as well (so [at least] one thread is needed to avoid those having problems; however, it was definitely the case that using any more threads for BOINC saw application performance fall away quite rapidly. Typical profile on the Intel is 1 GPU job (WCG or otherwise), 1 OPN1 job, 2 ARP1 jobs and 2 MCM1 jobs; on the Ryzen it's typically 1 GPU job (or 2 OPNG), 3 ARP1, 1 OPN1, 5 MCM1. Note the low OPN1 -- people who only have CPUs are welcome to my share! Don't know whether the above is of any help... As I said, all the above is based on Linux experience, and is specific to my use case. Cheers - Al. |
||
|
ca05065
Senior Cruncher Joined: Dec 4, 2007 Post Count: 325 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Thank you for your information.
My PC uses Windows 10 as an operating system but your comments are still relevant. I realise that my GT 1030 is a low end GPU and runs the OPNG work units much slower than high powered GPUs. However by running two OPNG work units simultaneously the temperature of the GPU remains more or less constant instead of oscillating by about 30 deg C as data is passed between the CPU and GPU. I rely on the operating priorities to run its own tasks (some high priority), other programs (normal priority) and then the BOINC work units (low priority). Provided the CPU to elapsed ratio is more than 90%, I am quite happy. I had known about the effect of MIP1 and ARP1 work units on level 3 cache, but had not seen any reference to MCM1 work units having a similar problem. However I do not know of any method of checking the usage of level 3 cache. |
||
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 902 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I understand about loading up the GPU to try to keep a more stable temperature - fair enough!
Hmm - perhaps I didn't phrase that bit about MCM1 and OPN1 as well as I might have... It isn't MCM1 having L3 cache problems, it's simply that if total throughput drops because of heavy load you're more likely to notice the effect on an MCM1 task because it isn't already slowed down because of cache accesses. MCM1 and SCC1 (when available) seem to be the tightest applications regarding memory access by a significant margin, and they only slow a little if L3 cache is being hammered by something else. Practical experiments can help with performance measuring for MCM1 and ARP1, which tend to have reasonably consistent run times if the system has a more or less constant, consistent, workload. OPN1 tasks, however, can be a little more variable as run time depends on the complexity of the ligands being processed and will also vary somewhat depending on the receptor being targeted. Note that there are two types of MCM1 task, one of which takes over twice as long as the other on my i7-7700K (3.8 hours versus 1.8 hours) but about 25% longer on the Ryzen with its better floating point bandwidth (1.6 hours versus 1.35 hours) so you'll see different run times for MCM1 anyway, but they should cluster around two common points. You may find you get to "do more science" if you run more MCM1 and less OPN1 - I certainly did! However, as you've got less cores arguing about cache access, you may find otherwise... Happy crunching - Al. P.S. The tools I use when I want to check MIPS, cache access, branch misses and other performance indicators are Linux-specific; I know there are similar tools for Windows, but I've no idea what they are (or how to use them...). |
||
|
HGRAY515
Cruncher Joined: Jun 23, 2011 Post Count: 11 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
alanb1951, I run WIN10 on I7 processor with 16GIG of RAM. Radeon 520 with 2GIG memory. I am currently set to process on all 8 cores of I7. MCM CPU and OPNG for GPU (only). Other than social media and steaming sports the load on the PC is low. Would you recommend to set BONIC to 7 (or less) cores if that would improve BONIC performance?
|
||
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 902 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
HGRAY515,
----------------------------------------How much relief one might get from running fewer tasks will depend on what tasks one is running, of course, and your proposed job-mix is not likely to hit one of the major performance degradation factors, cache issues (which leaves applications memory-bound!) Such issues can be caused either by applications with memory access algorithms that aren't cache-friendly, or because applications are being suspended by the O/S for long enough for other applications to use the cache instead! The former can only be resolved by reducing the number of cache-hungry tasks one runs at once, whilst the latter may be less of an issue if one has a spare CPU thread or two... On WCG, MIP1 was the real cache-killer, but that's finished now; ARP1 and HST1 like a fair amount of L3 cache, OPN1 doesn't seem to be as bad but still shows minor performance issues if one runs lots of them at once. MCM1 seems to be extremely L3-cache friendly, so is less likely to be affected. There are other factors that can influence throughput. For instance, a long time ago I saw some thermal throttling on my i7 when I was running lots of concurrent MCM1 and SCC1 (another high-performance WCG project!) during a UK "heat wave" -- I resolved that with a mixture of a better cooling solution and a slight CPU de-clocking until ambient temperatures dropped! Also, some CPUs will throttle if they draw power above inbuilt limits; there are applications out there that can work CPUs hard enough to cause that to happen, but I don't think any WCG application is likely to do that (as they don't use the most energy-intensive floating point instructions as not all CPUs support them...) So there are no definite rules one can offer! The best way to find out what suits you is to experiment, then decide whether you prefer the results of the changes you make! :-) After all, your performance goals may not be the same as mine (and Windows is not Linux!) [Edit. forgot to mention this...] Note that there are two different sorts of MCM1 task, one of which takes longer to run than the other! If you look at the output report on the WCG site, you'll find a line for the VMethod parameter, which will be either NFCV or LOO , The NFCV tasks take longer on the current version of the application. On my Intel machines, all of which are Kaby Lake CPUs, so quite old, NFCV tasks take around twice as long as LOO tasks; on my Ryzen 3700X (newer(!) and with improved pipelines) the difference is smaller, but still obvious. Newer Intel CPUs may also show less difference, but I can't vouch for that... So, if comparing run times, you'll need to be aware that there are likely to be a pair of typical run-times for MCM1. At present, on my i7 (which is currently clocked at 3.9 GHz) I get about 3.9 hours for NFCV, 1.9 hours for LOO; on an i5-7600 (clocked at 3.5 GHz) I get about 3.6 hours for NFCV, about 3.4 hours for LOO; the Ryzen gets about 1.65 hours for NFCV and about 1.4 hours for LOO. (The i5 runs less tasks than the i7 and has less cache contention; the Ryzen has better arithmetic pathways...) [...end of edit.] As I told ca05065, I'm not a Windows user, so my experiences are entirely Linux-based; in such an environment it definitely helps to have a free thread or two, but my work mix is different... For example, on the laptop from which I'm posting this, I have 2 cores/4 threads and let BOINC have half the CPUs (running 1 each of OPN1 and MCM1 or, rarely, an HST1 task) - those jobs run a lot better when the laptop is left on (screen-locked) when not in use as my "daily driver" - the browsing, working spreadsheets or whatever else takes up the slack threads. And my other systems run more diverse work loads so I expect them to have issues if I over-allocate resources! Cheers - Al. P.S. you'd get better/more useful insights if a Windows user responded, I suspect! [Edit 2 times, last edit by alanb1951 at Sep 8, 2021 4:34:24 AM] |
||
|
ca05065
Senior Cruncher Joined: Dec 4, 2007 Post Count: 325 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I have decided to avoid the problem of MCM1 slowing significantly when OPNG are also running by ceasing to request MCM1. I will restart MCM1 when it is obvious that OPNG work units have been paused.
@HGRAY515 I use windows 10 on a Ryzen 5 3600. In the heat of the past few days, Tthrottle has throttled back the CPU to prevent its temperature rising above 79 deg C. This is well below the level of 95 deg C at which the chip limits itself but I hope to extend the life of the chip by being conservative. |
||
|
HGRAY515
Cruncher Joined: Jun 23, 2011 Post Count: 11 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I decided to test using only 7 of 8 cores for a week starting last night. I will update once some OPNG GPU ones process likely this weekend. And I will try to determine on MCM1's if those process differently. Thanks for the suggestions. Worth a try for a week.
|
||
|
HGRAY515
Cruncher Joined: Jun 23, 2011 Post Count: 11 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The results of my switch running only 7 of 8 cores: MCM were running on 7 of 8 cores and appeared to run slightly slower (that was surprising). For OPNG, I am running only GPU and those were running about 50% or more faster.
|
||
|
|
![]() |