| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 20
|
|
| Author |
|
|
Aperture_Science_Innovators
Advanced Cruncher United States Joined: Jul 6, 2009 Post Count: 139 Status: Offline Project Badges:
|
I've anecdotally noticed over the past couple days that a couple of my GPU-less Linux boxes seemed to be badly underperforming the points they typically get (as the ones with GPUs makes it very difficult to observe with the inconsistent flow of OPNG WUs). I crunched some numbers this afternoon, and have found some surprising results.
----------------------------------------I run a couple of dual-socket Xeon E5 v3/v4 systems. A mix of CPU models, but they're all Haswell-EP/Broadwell-EP systems in the 2.2-2.6GHz range. All have Hyperthreading enabled, and a relatively skimpy RAM configuration (16 or 32GB, split across 1 or 2 sticks of DDR4-2133 per CPU). These are the ones where I noticed the oddness. Historically, I've seen that all of these systems get somewhere in the 20-23 points/hr. Some fluctuation based on workunit & CPU mode; ARP "pays" a bit better in terms of points, but MCM & OPN1 are pretty consistently low-20s points/hr. I downloaded a CSV file of the WU results from WCGrid this afternoon, and have found many instances of these systems getting 11-12 points per hour of runtime on MCM1. I even found about a dozen WUs getting in the 8 points per hour range! But, on to more concrete numbers. To make for straightforward analysis, I imposed a cutoff of 15 points/hr to see how many WUs fell above & below. I'm seeing 38% of MCM WUs (850 out of 2219 from the history available in the CSV file) on these systems are getting below 15 points per hour. By contrast, only .5% (yes, half a percent) of the OPN1 WUs on the same systems are getting < 15 points/hr. So, MCM seems to be much more inconsistent here than OPN1 is. I don't have a lot of data from other platforms (Windows, really) to compare to, but I do have a Xeon E3 V2 system that is roughly comparable (slightly higher clock speed, slightly older architecture). 100% of its MCM WUs (as well as 100% of the 8 OPN1 WUs) are getting > 15 points/hr. So, it seems that Linux is giving inconsistent MCM performance, but Windows does not. I do have a couple of much faster i7 systems running Windows 10, and while the points/hr aren't comparable (between the newer CPU architectures and nearly double the clockspeed) but I can see basically the same trend as my Xeon E3. The vast majority of WUs are scoring between 50 and 55 points an hour. A few score above 60/hr, a couple are getting down as far as 40. But, it's a pretty tight distribution overall (stdev = 4.13; by contrast the stdev on the Linux systems is 3.8, despite a much power average) Has anyone else seen similar behaviour where points are all over the place on MCM on their Linux systems? Or, is it that MCM is more bandwidth-sensitive than the other apps & it's tanking the performance because of that? Curious to see what others are seeing, and happy to share the SQL I'm using to pull these numbers if anyone wants to replicate. ![]() |
||
|
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7846 Status: Offline Project Badges:
|
I run a Dual E5-2665 (8gb memory hyperthreaded - 32 threads). It is running LMDE 4 (debbie) [4.19.0-8-amd64|libc 2.28 (Debian GLIBC 2.28-10. For the last 39 OPN1 units it has averaged 24.43 points per hour. For 50 MCM units it has averaged 14.75 points per hour. Both have been remarkably consistent with about plus or minus about 2 points per hour.
----------------------------------------Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
|
AgrFan
Senior Cruncher USA Joined: Apr 17, 2008 Post Count: 396 Status: Recently Active Project Badges:
|
I'm seeing 15 points/hour for MCM on my Ubuntu box. Pentium E5200 with 8gb RAM. Work units are taking 6+ hours to complete.
----------------------------------------I'm seeing 70 points/hour for MCM on my Win10 boxes. i7-2600, i5-3330 and i5-4590 CPUs with 8gb RAM. Work units are taking 1-2 hours to complete. There are two flavors of MCM work, L00 and NFCV. L00 runs faster than NFCV on Linux. Both flavors take the same amount of time on Windows.
[Edit 6 times, last edit by AgrFan at Nov 7, 2022 1:24:04 AM] |
||
|
|
Aperture_Science_Innovators
Advanced Cruncher United States Joined: Jul 6, 2009 Post Count: 139 Status: Offline Project Badges:
|
Thank you both for chiming in -- glad to hear I'm not crazy :)
----------------------------------------I guess the next system I'll try Windows and see how that fares in comparison. ![]() |
||
|
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 1317 Status: Offline Project Badges:
|
I'd started this before AgrFan's observations on the two MCM1 VMethod options came in, so I have revised it and will post anyway...
It appears that any significant difference in performance between the two methods can be partially attributed to how recently a version of the software was built and what compiler options were deployed (which almost certainly accounts for Linux vs. Windows differences), but how new one's CPUs are may also be significant! I run MCM1 on four different Linux systems, all of which are a bit newer than AgrFan's systems so I'd expect them to be a little quicker for Linux vs. Linux comparisons, but acknowledge that some of the systems take markedly longer than AgrFan's Windows systems to run an MCM1 task! Results on my laptop are so variable that they aren't worth reporting on at all, but the other three may be of interest. Given how total run time can vary on a machine with a mixed work load, I don't find credit per hour particularly useful when comparing different systems so I'll only quote total credit here :-), and I would also observe that credit can be variable if one's wingman result produces a very low (or very high) estimate (so I strip obvious outliers when looking at my data...) My i5-7600 (4 cores, 4 threads, 8GB RAM, clock locked at 3.5GHz) typically takes 3.6 to 3.7 hours for NFCV tasks, 1.65 hours for LOO tasks. Typical total credit is around 115..120 for NFCV and around 65 for LOO. My i7-7700K (4 cores, 8 threads, 16GB RAM, clock locked at 4.0GHz) typically takes 3.8 hours for NFCV tasks, 1.8 hours for LOO tasks - it's slower than the 6600 because of the threading... Typical total credit is around 115 for NFCV, 67 for LOO - note how close that is to the other Intel system! My Ryzen 3700X (8 cores, 16 threads, 32 GB RAM, power capped at 80W, typical clock 3.9GHz) typically takes 1.8 hours for NFCV tasks and 1.5 hours for LOO tasks. Note how much better it is at NFCV when compared to LOO -- I suspect modern Intel CPUs might show up better for this on Linux too! Typical total credit is 100..105 for NFCV and around 75..80 for LOO, so lower for NFCV but more generous for LOO! Now, I mentioned compilers... The last time IBM-WCG ran a Beta Test for MCM1 they had obviously used a newer compiler and maths libraries for the Linux version and NFCV Beta task run times were a lot closer to those for LOO tasks (and occasionally faster, especially on the Ryzen!!!) So when Krembil have finally got WCG into some sort of sane state it might be worth suggesting they do a software rebuild and Beta test with newer compilers... Cheers - Al. P.S. Regarding OPN1 credit -- all my systems (including a Raspberry Pi!) tend to score between 85 and 105 total credit per task, whilst the Intel systems take 1.5 to 2 hours per task and the Ryzen 1.0 to 1.3 hours. (I won't embarrass the Pi by quoting run time -- the lack of L3 cache is a disaster for OPN1!) The run time for OPN1 is very dependent on the nature of the ligands presented for docking, and whilst WCG-IBM made their best efforts to achieve something consistent, it doesn't always work :-) |
||
|
|
Aperture_Science_Innovators
Advanced Cruncher United States Joined: Jul 6, 2009 Post Count: 139 Status: Offline Project Badges:
|
I'd started this before AgrFan's observations on the two MCM1 VMethod options came in, so I have revised it and will post anyway... It appears that any significant difference in performance between the two methods can be partially attributed to how recently a version of the software was built and what compiler options were deployed (which almost certainly accounts for Linux vs. Windows differences), but how new one's CPUs are may also be significant! I run MCM1 on four different Linux systems, all of which are a bit newer than AgrFan's systems so I'd expect them to be a little quicker for Linux vs. Linux comparisons, but acknowledge that some of the systems take markedly longer than AgrFan's Windows systems to run an MCM1 task! Results on my laptop are so variable that they aren't worth reporting on at all, but the other three may be of interest. Given how total run time can vary on a machine with a mixed work load, I don't find credit per hour particularly useful when comparing different systems so I'll only quote total credit here :-), and I would also observe that credit can be variable if one's wingman result produces a very low (or very high) estimate (so I strip obvious outliers when looking at my data...) My i5-7600 (4 cores, 4 threads, 8GB RAM, clock locked at 3.5GHz) typically takes 3.6 to 3.7 hours for NFCV tasks, 1.65 hours for LOO tasks. Typical total credit is around 115..120 for NFCV and around 65 for LOO. My i7-7700K (4 cores, 8 threads, 16GB RAM, clock locked at 4.0GHz) typically takes 3.8 hours for NFCV tasks, 1.8 hours for LOO tasks - it's slower than the 6600 because of the threading... Typical total credit is around 115 for NFCV, 67 for LOO - note how close that is to the other Intel system! My Ryzen 3700X (8 cores, 16 threads, 32 GB RAM, power capped at 80W, typical clock 3.9GHz) typically takes 1.8 hours for NFCV tasks and 1.5 hours for LOO tasks. Note how much better it is at NFCV when compared to LOO -- I suspect modern Intel CPUs might show up better for this on Linux too! Typical total credit is 100..105 for NFCV and around 75..80 for LOO, so lower for NFCV but more generous for LOO! Now, I mentioned compilers... The last time IBM-WCG ran a Beta Test for MCM1 they had obviously used a newer compiler and maths libraries for the Linux version and NFCV Beta task run times were a lot closer to those for LOO tasks (and occasionally faster, especially on the Ryzen!!!) So when Krembil have finally got WCG into some sort of sane state it might be worth suggesting they do a software rebuild and Beta test with newer compilers... Cheers - Al. P.S. Regarding OPN1 credit -- all my systems (including a Raspberry Pi!) tend to score between 85 and 105 total credit per task, whilst the Intel systems take 1.5 to 2 hours per task and the Ryzen 1.0 to 1.3 hours. (I won't embarrass the Pi by quoting run time -- the lack of L3 cache is a disaster for OPN1!) The run time for OPN1 is very dependent on the nature of the ligands presented for docking, and whilst WCG-IBM made their best efforts to achieve something consistent, it doesn't always work :-) Thanks a bunch of the numbers, and the systems you're running them on. I just checked a couple of mine to look at the different WU types now that you brought that up. I have an i7-8700K (6c/12t @ 4.7GHz), 32GB RAM (Windows 10) that seems to do the NFCV WUs in about 1.4 hours, and the L00 ones in about 1.6. Interestingly this seems to be backwards of what you reported (performance between the two versions) although is close. There's not enough NFCV WUs to get a good average, but it's looking like it gets about 80 point per on these, and somewhere around 90-100 (although occasional dips down to 80-ish) for the L00 ones. By contrast, on one Linux box, a 2696v4 (44c/88t @ ~2.6GHz), the L00s are taking 3.8-3.9 hours and coming back at about 70 points each. The NFCVs are taking about 8 hours (7.8-8.2 depending on which one) and coming back at 95-115 points each (with some occasional ones up to 130-ish, depending on the wingman). Very interesting behaviour. Would be nice if there was a way to limit the L00s to Linux, where they perform comparatively better, and NFCV to Windows, which doesn't seem to struggle with them so much. ![]() |
||
|
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7846 Status: Offline Project Badges:
|
I decided to look at my Windows 7 Ultimate system, I7-7700 8gb memory. MCM runs much faster on this system than the older XEON Linux system. It showed an average of 1.8 hours cpu time, average granted credit 84 pts and 45pts/hr. There does not seem to be any significant difference in the kind of work unit the MCM is, so I did not separate the LOO and the NFCV units.
----------------------------------------The OPN units on this machine all run between 2.8 to 3.5 hours. Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
|
Aperture_Science_Innovators
Advanced Cruncher United States Joined: Jul 6, 2009 Post Count: 139 Status: Offline Project Badges:
|
Thanks, Joe. Definitely strange behaviour between them, and I'm looking forward to trying one of these "old" Xeon systems on Windows.
----------------------------------------![]() |
||
|
|
rjs5
Cruncher Joined: Jan 22, 2011 Post Count: 6 Status: Offline Project Badges:
|
I have a Windows 11 machine and a Linux Fedora 35 machine that are both crunching MCM. The windows MCM jobs finish in 1.5 hours and the Linux jobs take 4 to 5 hours.
IT is not a Linux Problem. It is an MCM performance bug with SPIN LOCKS and CRITICAL REGIONS that cause Linux to get stuck in the KERNEL on probably a single contested LOCK. It is a fairly simple bug to fix, but it will take some knowledge about CRITICAL REGIONS. The Windows results are stable and consistent at 1.5 hours. I am running 50% of the CPU on each machine which is about 18 jobs concurrently. There is something wrong with the Linux version (wcgrid_mcm1_map_7.61_x86_64-pc-linux-gnu). When I start the Linux machine, the vm USER time is 50% and the SYSTEM time is about 2%. The USER time (used for doing real MCM work) drops from 50% to 33% and SYSTEM time jumps to 25%. It looks like MCM has a design problem where each MCM is spending TOO MUCH TIME in a SPIN LOCK CRITICAL REGION. Using "vm", you can see the USER TIME is 33% and SYSTEM TIME is 25%. --procs-- -----------------------memory---------------------- ---swap-- -----io---- -system-- --------cpu-------- r b swpd free buff cache si so bi bo in cs us sy id wa st 21 0 0 122294192 3984 5291504 0 0 0 0 194627 2080 33 25 41 0 0 21 0 0 122292944 3984 5291536 0 0 0 42 193275 2176 33 25 42 0 0 21 0 0 122292328 3984 5291536 0 0 0 0 195470 2489 33 26 41 0 0 21 0 0 122292384 3984 5291536 0 0 0 1598 194493 2085 33 25 41 0 0 21 0 0 122296336 3984 5287400 0 0 0 42 196159 2490 33 25 41 0 0 21 0 0 122289960 3984 5288984 0 0 0 0 194543 2066 33 25 41 0 0 21 0 0 122290528 3984 5287704 0 0 0 42 194701 2398 33 26 42 0 0 Using "perf top", you can see that MCM is not doing much on Linux machines .. other than consuming your power and heating your house. 14.39% [kernel] [k] syscall_return_via_sysret 13.71% [kernel] [k] syscall_exit_to_user_mode 8.09% [kernel] [k] entry_SYSCALL_64_after_hwframe 5.88% [kernel] [k] __entry_text_start 2.48% [kernel] [k] cputime_adjust 1.71% [kernel] [k] preempt_count_add 1.68% [kernel] [k] _raw_spin_lock_irqsave 1.57% [kernel] [k] task_sched_runtime 1.46% [kernel] [k] thread_group_cputime 1.28% [kernel] [k] do_sys_times 1.26% [kernel] [k] update_curr 1.16% wcgrid_mcm1_map_7.61_x86_64-pc-linux-gnu [.] 0x000000000016b45f 1.13% wcgrid_mcm1_map_7.61_x86_64-pc-linux-gnu [.] 0x000000000016b48e 1.10% [kernel] [k] preempt_count_sub 1.00% [kernel] [k] copy_user_short_string 0.98% [kernel] [k] __x64_sys_times 0.87% wcgrid_mcm1_map_7.61_x86_64-pc-linux-gnu [.] 0x0000000000166e51 0.87% [kernel] [k] exit_to_user_mode_prepare 0.86% [kernel] [k] cpuacct_charge 0.84% [kernel] [k] entry_SYSRETQ_unsafe_stack 0.84% wcgrid_mcm1_map_7.61_x86_64-pc-linux-gnu [.] 0x00000000001a6e57 0.83% [kernel] [k] __cgroup_account_cputime 0.82% [kernel] [k] sysret32_from_system_call 0.80% wcgrid_mcm1_map_7.61_x86_64-pc-linux-gnu [.] 0x00000000001aeb0a 0.79% wcgrid_mcm1_map_7.61_x86_64-pc-linux-gnu [.] 0x0000000000167e9e 0.78% [kernel] [k] entry_SYSENTER_compat_after_hwframe 0.76% wcgrid_mcm1_map_7.61_x86_64-pc-linux-gnu [.] 0x0000000000166dea 0.73% [kernel] [k] do_syscall_64 |
||
|
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7846 Status: Offline Project Badges:
|
Thanks rjs5.
----------------------------------------You have provided a good look at exactly the behavior I have noticed. MCM runs much more quickly on Windows than on Linux. It seems to make little difference on which MCM it is, either LOO or NFCV on the Windows system. The difference of the two types on Linux is much slower with the NFCV unit being about twice as slow as the LOO units on Linux. It certainly would be beneficial if whoever wrote the underlying code for Linux would take a look and try to perhaps optimize the code a bit would certainly help the throughput of units. This reminds me a bit of a database problem many years a go in which the process seemed to take a lot longer than it should have. It turned out there was an indexing loop which was out of place where the application spent way too much time indexing. A minor rewrite of the looping function resulted in about an 80% speedup in the application. I do realize not all fixes are this simple. Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
|
|