Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 20
Posts: 20   Pages: 2   [ Previous Page | 1 2 ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 4929 times and has 19 replies Next Thread
thunder7
Senior Cruncher
Netherlands
Joined: Mar 6, 2013
Post Count: 238
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Very Wide Points Spread on Linux?

I have linux machines like this, all running a full load on all cpu's of MCM.

2x 2696V4, 88 threads, 256 Gb memory, 0.3% system time, linux 5.4.0
2x 2696V2, 48 threads, 384 Gb memory, 3% system time, linux 6.0.0
2x 2680V4, 40 threads, 64 Gb memory, 1% system time, linux 4.19

If it were true that MCM tasks on linux used a lot of system time, I would expect to have the 88 thread system to have a lot of system time - which I don“t see.

vmstat 2 on the first machine certainly doesn't show anything obviously wrong:


$ vmstat -w 2
procs -----------------------memory---------------------- ---swap-- -----io---- -system-- --------cpu--------
r b swpd free buff cache si so bi bo in cs us sy id wa st
92 0 150152 247455408 192376 7694464 0 0 0 19 0 0 74 22 4 0 0
88 0 150152 247455136 192380 7694464 0 0 0 122 23574 1798 100 0 0 0 0
92 0 150152 247455136 192380 7694464 0 0 0 0 23567 1675 100 0 0 0 0
93 0 150152 247455136 192380 7694464 0 0 0 0 23612 1780 100 0 0 0 0
89 0 150152 247455392 192396 7694456 0 0 0 394 23637 1768 100 0 0 0 0
93 0 150152 247453632 192396 7694464 0 0 0 22 23573 1803 100 0 0 0 0
89 0 150152 247454512 192396 7694464 0 0 0 296 23738 1767 100 0 0 0 0
95 0 150152 247454128 192404 7694464 0 0 0 228 23698 2055 100 0 0 0 0
96 0 150152 247455088 192404 7694464 0 0 0 0 23577 1733 100 0 0 0 0
93 0 150152 247455088 192404 7694464 0 0 0 26 23611 1856 100 0 0 0 0
91 0 150152 247455824 192412 7694464 0 0 0 38 23630 1773 100 0 0 0 0
96 0 150152 247455536 192412 7694464 0 0 0 0 23600 1793 100 0 0 0 0
93 0 150152 247400032 192420 7695304 0 0 0 106 27419 7306 99 1 0 0 0
90 0 150152 247396640 192420 7694228 0 0 0 278 23780 2065 100 0 0 0 0
91 0 150152 247414048 192428 7694220 0 0 0 180 23601 1755 100 0 0 0 0
90 0 150152 247414784 192428 7694232 0 0 0 20 23544 1831 100 0 0 0 0
88 0 150152 247414528 192428 7694232 0 0 0 0 23564 1719 100 0 0 0 0
90 0 150152 247414592 192432 7694228 0 0 0 108 23559 1752 100 0 0 0 0
89 0 150152 247426784 192432 7694232 0 0 0 16 23673 1896 100 0 0 0 0
104 0 150152 247426656 192432 7694232 0 0 0 292 23591 1805 100 0 0 0 0
93 0 150152 247427168 192440 7694232 0 0 0 72 23512 1661 100 0 0 0 0
90 0 150152 247427168 192440 7694232 0 0 0 18 23557 1809 100 0 0 0 0
94 0 150152 247427264 192448 7694224 0 0 0 680 23784 1839 100 0 0 0 0
94 0 150152 247428032 192448 7694232 0 0 0 0 23605 1807 100 0 0 0 0
88 0 150152 247428096 192448 7694232 0 0 0 50 23629 1780 100 0 0 0 0
92 0 150152 247428352 192456 7694232 0 0 0 90 23607 1844 100 0 0 0 0


What exactly is the command to get that 'perf' thing? I can start perf, I can zoom in on a process, but it doesn't look anything like what you posted.
[Jan 13, 2023 7:18:54 PM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1317
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Very Wide Points Spread on Linux?

@rjs5

Ryzen 3700X and Intel i7-7700K here, booth running XUbuntu 22.04. MCM1 tasks on the Ryzen usually taking between 1.25 and 2 hours depending on other work running (and which of the two MCM1 task types it is!), whereas MCM! on the i7 can take from 1.6 to 5 hours!

Whilst I often use perf stat to investigate various aspects of a task's performance, I'm not that familiar with perf top, so I'm not sure what parameters you used for that report, and I'd be interested to know... I tried it on the Ryzen with an MCM1 PID and my output didn't look anything like yours, so perhaps I wasn't using it right :-) -- however, the display did suggest I could get a "higher level view" with some options it suggested (--sort comm,dso), so I tried that too; that showed over 99% process, less than 1% kernel.

It is worth noting that MCM1 tasks have two threads, one of which is presumably where the process management functions (watchdog timer?; checkpoint management?) live and seems to be mostly idle[1], whilst the other is the compute thread. So (still on the Ryzen) I tried perf top on the thread IDs for the same task, and got something more akin to your display from the secondary thread, whilst the primary thread display was nothing like it (no kernel references in the top 50 lines...). I had to tune the sample event period to stop it complaining about being "too slow to read the ring buffer", but I don't suppose that would make such a huge difference to the output...

Still on the Ryzen, I used the sort option on the thread IDs to see what would happen, and observed the following...

  • For the thread ID that matched the PID it reported 99% or more main process, 1% or less kernel;
  • for the other thread ID it reported 75% or more kernel, 25% or less main process.

I looked at the Ryzen first because it's so much quicker on the longer-running (NFCV) tasks, but I eventually got round to the i7 and (surprise, surprise) it seems to behave as indicated in your post!

And I checked vmstat (as per thunder7's post, which turned up whilst I was writing this...) and my Ryzen has almost no "system" time (but lots of idle...) whilst the Intel has a consistent 15% system and 50% user... Of course, that's not all about MCM1 tasks on either system, but it's interesting to note the generally higher apparent overheads on the Intel with my mix of BOINC and non-BOINC work...

Now, both systems run the same executable (as far as I am aware - it's certainly the same version number...) and are on the same Linux kernel (5.15 family), the same system libraries and the same BOINC client. So why the huge difference in behaviour??? Are the basic stats used by perf collected differently on the two different system types?

All that said, recalling how much faster the last Beta-test MCM1 program was on Linux over two years ago, I suspect the real performance fix might be simply to recompile with a newer Linux compiler and libraries! Sadly, that Beta didn't result in a new application (and meant that the ARM version they trialled hasn't seen the light of day - pity that, so much more efficient than OPN1...)

Cheers - Al.

[1] Verified with perf stat on both the Intel and the Ryzen.
[Jan 13, 2023 9:30:13 PM]   Link   Report threatening or abusive post: please login first  Go to top 
thunder7
Senior Cruncher
Netherlands
Joined: Mar 6, 2013
Post Count: 238
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Very Wide Points Spread on Linux?

On the other hand, I just noticed my 88 thread machine seemed to be gathering points more slowly than expected.

top output:

top - 10:44:18 up 27 days, 16:35, 2 users, load average: 92.33, 92.03, 91.89
Tasks: 1055 total, 89 running, 966 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.2 us, 35.0 sy, 64.7 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 257815.8 total, 241312.1 free, 8527.6 used, 7976.2 buff/cache
MiB Swap: 2048.0 total, 1901.4 free, 146.6 used. 247489.2 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1393819 boinc 39 19 79936 79168 2444 R 100.0 0.0 266:03.18 wcgrid_mcm1_map
1502554 boinc 39 19 77204 76520 2444 R 100.0 0.0 209:49.03 wcgrid_mcm1_map
1529230 boinc 39 19 76804 76104 2444 R 100.0 0.0 201:48.88 wcgrid_mcm1_map
1742065 boinc 39 19 73524 72732 2444 R 100.0 0.0 130:36.70 wcgrid_mcm1_map
1988203 boinc 39 19 69608 68896 2444 R 100.0 0.0 49:16.25 wcgrid_mcm1_map
2022040 boinc 39 19 68984 68340 2444 R 100.0 0.0 37:26.99 wcgrid_mcm1_map
1392938 boinc 39 19 80156 79500 2444 R 99.7 0.0 273:45.66 wcgrid_mcm1_map
1393822 boinc 39 19 79944 79208 2444 R 99.7 0.0 267:00.22 wcgrid_mcm1_map
1394114 boinc 39 19 79848 79196 2444 R 99.7 0.0 265:43.75 wcgrid_mcm1_map
1394979 boinc 39 19 79516 78824 2444 R 99.7 0.0 257:54.17 wcgrid_mcm1_map
1394983 boinc 39 19 79556 78828 2444 R 99.7 0.0 257:36.47 wcgrid_mcm1_map
1395274 boinc 39 19 79416 78772 2444 R 99.7 0.0 257:12.44 wcgrid_mcm1_map


35% system time? Oops!

sudo perf top:

Samples: 1M of event 'cycles', 4000 Hz, Event count (approx.): 796031096256 lost: 0/0 drop: 0/0
Overhead Shared Object Symbol
19.32% [kernel] [k] do_syscall_64
7.34% [kernel] [k] syscall_return_via_sysret
6.27% [kernel] [k] cputime_adjust
6.05% [kernel] [k] entry_SYSCALL_64
1.60% [kernel] [k] thread_group_cputime
1.51% [kernel] [k] update_curr
1.46% [kernel] [k] entry_SYSCALL_64_after_hwframe
1.44% [kernel] [k] __calc_delta
1.23% [kernel] [k] do_sys_times
1.22% [kernel] [k] task_sched_runtime
1.06% [kernel] [k] _raw_spin_lock_irqsave
1.05% [kernel] [k] __x64_sys_times
0.97% [kernel] [k] copy_user_generic_unrolled
0.96% [kernel] [k] __lock_text_start
0.94% wcgrid_mcm1_map_7.61_x86_64-pc-linux-gnu [.] 0x0000000000023064


Hmmmmm.
[Jan 14, 2023 9:50:31 AM]   Link   Report threatening or abusive post: please login first  Go to top 
rjs5
Cruncher
Joined: Jan 22, 2011
Post Count: 6
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Very Wide Points Spread on Linux?

Well ... when I use perf-top, i just use "perf top". It is VERY hdlpful when the binary has symbols.

It looks like the running slow symptom is my problem and not MCM. The CPU i9-9980xe is sending a signal to Linux indicating a thermal problem and Linux is "idling" for a period of time

I looked around to see if I could get more information on the LOCK problem and found:
perf lock record
control-c after a few seconds
perf lock report

RECORD command stores data in a file "perf.data".
REPORT reads and formats the data

The hottest lock turns out to be "pkg_thermal_notify" which indicates my CPU is overheating. 8-(


perf lock report
Name acquired contended avg wait total wait max wait min wait

pkg_thermal_noti... 56421 56421 31.11 us 1.76 s 120.87 us 2 ns
futex_wake+0xa0 4460 4460 28.56 us 127.37 ms 106.07 us 1.54 us
__folio_end_writ... 320 320 11.84 us 3.79 ms 22.30 us 3.02 us
rcu_core+0xd4 219 219 3.52 us 770.77 us 9.73 us 1.82 us
futex_q_lock+0x26 199 199 7.78 us 1.55 ms 63.20 us 1.42 us
futex_wake+0xa0 56 56 22.70 us 1.27 ms 45.87 us 2.38 us
raw_spin_rq_lock... 56 56 5.34 us 298.90 us 16.62 us 2.26 us
raw_spin_rq_lock... 51 51 6.55 us 334.08 us 21.43 us 2.55 us
raw_spin_rq_lock... 49 49 13.04 us 638.97 us 36.17 us 2.82 us
raw_spin_rq_lock... 46 46 8.14 us 374.45 us 21.57 us 2.44 us
raw_spin_rq_lock... 41 41 6.49 us 266.29 us 12.54 us 2.45 us
btrfs_read_lock_... 40 40 20.20 us 808.07 us 71.87 us 1.83 us
raw_spin_rq_lock... 37 37 6.07 us 224.74 us 15.19 us 2.59 us
tick_do_update_j... 37 37 6.94 us 256.72 us 25.97 us 2.30 us
raw_spin_rq_lock... 35 35 6.64 us 232.44 us 12.37 us 2.53 us
__queue_work+0x174 34 34 4.51 us 153.33 us 8.50 us 1.42 us
raw_spin_rq_lock... 34 34 6.76 us 229.97 us 19.46 us 1.90 us
[Jan 14, 2023 3:20:26 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Very Wide Points Spread on Linux?

Just to add another data point, dual 2695v3, 100% MCM, linux 5.4.0:

Tasks: 593 total, 29 running, 564 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.1 us, 1.1 sy, 49.0 ni, 49.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 32055.3 total, 28563.2 free, 2443.8 used, 1048.3 buff/cache
MiB Swap: 2048.0 total, 2048.0 free, 0.0 used. 29174.4 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
10866 boinc 39 19 73696 72924 2392 R 105.9 0.2 77:06.78 wcgrid_+
10809 boinc 39 19 73696 72928 2392 R 100.0 0.2 146:00.80 wcgrid_+
10812 boinc 39 19 73696 72924 2392 R 100.0 0.2 144:53.36 wcgrid_+
10814 boinc 39 19 73696 72928 2392 R 100.0 0.2 141:50.95 wcgrid_+
10819 boinc 39 19 73696 72924 2392 R 100.0 0.2 137:53.19 wcgrid_+
10824 boinc 39 19 73696 72928 2392 R 100.0 0.2 131:02.93 wcgrid_+
10835 boinc 39 19 73696 72928 2392 R 100.0 0.2 113:49.22 wcgrid_+
10859 boinc 39 19 73696 72928 2392 R 100.0 0.2 89:27.37 wcgrid_+
....
[Jan 14, 2023 6:32:51 PM]   Link   Report threatening or abusive post: please login first  Go to top 
thunder7
Senior Cruncher
Netherlands
Joined: Mar 6, 2013
Post Count: 238
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Very Wide Points Spread on Linux?

Just like that, this morning it's gone again

Samples: 14M of event 'cycles', 4000 Hz, Event count (approx.): 3596859994781 lost: 0/0 drop: 0/0
Overhead Shared Object Symbol
4.88% wcgrid_mcm1_map_7.61_x86_64-pc-linux-gnu [.] 0x0000000000022a28
4.12% wcgrid_mcm1_map_7.61_x86_64-pc-linux-gnu [.] 0x00000000000177b1
3.02% wcgrid_mcm1_map_7.61_x86_64-pc-linux-gnu [.] 0x00000000000177aa
2.60% wcgrid_mcm1_map_7.61_x86_64-pc-linux-gnu [.] 0x00000000000177a0
2.42% wcgrid_mcm1_map_7.61_x86_64-pc-linux-gnu [.] 0x0000000000024309
2.29% wcgrid_mcm1_map_7.61_x86_64-pc-linux-gnu [.] 0x0000000000024314
2.26% wcgrid_mcm1_map_7.61_x86_64-pc-linux-gnu [.] 0x0000000000017346
2.16% wcgrid_mcm1_map_7.61_x86_64-pc-linux-gnu [.] 0x0000000000022a38
2.14% wcgrid_mcm1_map_7.61_x86_64-pc-linux-gnu [.] 0x0000000000024316
2.13% wcgrid_mcm1_map_7.61_x86_64-pc-linux-gnu [.] 0x000000000002430c
2.09% wcgrid_mcm1_map_7.61_x86_64-pc-linux-gnu [.] 0x0000000000022a33
1.56% wcgrid_mcm1_map_7.61_x86_64-pc-linux-gnu [.] 0x0000000000024311
1.10% wcgrid_mcm1_map_7.61_x86_64-pc-linux-gnu [.] 0x00000000000242f7
1.04% wcgrid_mcm1_map_7.61_x86_64-pc-linux-gnu [.] 0x0000000000024300
0.95% wcgrid_mcm1_map_7.61_x86_64-pc-linux-gnu [.] 0x0000000000022a00
0.86% wcgrid_mcm1_map_7.61_x86_64-pc-linux-gnu [.] 0x0000000000007990
0.82% wcgrid_mcm1_map_7.61_x86_64-pc-linux-gnu [.] 0x0000000000024305
0.81% wcgrid_mcm1_map_7.61_x86_64-pc-linux-gnu [.] 0x000000000002428d
0.64% wcgrid_mcm1_map_7.61_x86_64-pc-linux-gnu [.] 0x000000000002431a
0.64% wcgrid_mcm1_map_7.61_x86_64-pc-linux-gnu [.] 0x0000000000005ff9
0.63% wcgrid_mcm1_map_7.61_x86_64-pc-linux-gnu [.] 0x000000000001733a
0.62% wcgrid_mcm1_map_7.61_x86_64-pc-linux-gnu [.] 0x0000000000018ab2
0.57% wcgrid_mcm1_map_7.61_x86_64-pc-linux-gnu [.] 0x0000000000006010
0.54% wcgrid_mcm1_map_7.61_x86_64-pc-linux-gnu [.] 0x00000000000186c2
0.54% wcgrid_mcm1_map_7.61_x86_64-pc-linux-gnu [.] 0x0000000000024290
0.53% wcgrid_mcm1_map_7.61_x86_64-pc-linux-gnu [.] 0x000000000001734a
0.52% wcgrid_mcm1_map_7.61_x86_64-pc-linux-gnu [.] 0x0000000000017330
0.48% wcgrid_mcm1_map_7.61_x86_64-pc-linux-gnu [.] 0x0000000000018ab6
0.47% perf [.] queue_event

This machine runs no other software, so it's probable that MCM itself causes it. I do notice that yesterday was a bad day in terms of results (60% of the average of the past 4 days), points (75% of the average of the past 4 days) but not runtime (the same as the past 4 days).

I know I get 2 types of MCM tasks - the short and the long ones.
The short ones take a bit less than 4 hours, the long ones take a bit less than 8 hours.
short: https://www.worldcommunitygrid.org/contribution/workunit/245413338

long: https://www.worldcommunitygrid.org/contribution/workunit/245384006

But I'm not sure they take longer because 35% of the cpu power is spent on system calls, or that they just are different and the extra system time is not the cause of the longer computation.
Yesterday a cursory glance told me I had a lot of the 'long' tasks.

I am somewhat suspect that the long unit above was solved by another linux user in 1.82 hours. A Xeon V4 @ 2.8 GHz isn't the fastest machine, but is there something out there that runs 4 times faster? And why does that system claim just 10% more points for four times the speed?
[Jan 16, 2023 5:30:33 AM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1317
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Very Wide Points Spread on Linux?

I am somewhat suspect that the long unit above was solved by another linux user in 1.82 hours. A Xeon V4 @ 2.8 GHz isn't the fastest machine, but is there something out there that runs 4 times faster? And why does that system claim just 10% more points for four times the speed?
I'm not surprised by the 1.82 hour time for one of the "long" MCM1 tasks (VMethod=NFCV)-- in my earlier post in this thread I mentioned that my Ryzen 3700X gets through [longer] MCM1 tasks in under 2 hours... And it typically takes 1.25 to 1.4 hours for the shorter ones (VMethod=LOO).

As for the points score, that's based on typical times for the particular host, so [for instance] if it's twice as fast that doesn't entitle it to twice the points per task for doing the same amount of work in half the time -- it gets double the points by doing twice as many tasks :-)

By the way, as I had samples of both LOO and NFCV tasks in my present work mix I looked at an example of each using perf top on both my Ryzen and my Intel i7. The LOO ones show almost no kernel activity in the main thread, whilst the NFCV main threads show a lot of kernel activity on both systems! That expliains the differences I reported in that other post...

However, whatever it is in the NFCV code-path that provokes the extra kernel time, it doesn't seem to burn off such a high proportion of CPU time on the Ryzen...

Cheers - Al.
[Jan 16, 2023 6:54:01 AM]   Link   Report threatening or abusive post: please login first  Go to top 
thunder7
Senior Cruncher
Netherlands
Joined: Mar 6, 2013
Post Count: 238
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Very Wide Points Spread on Linux?

I can only dream of getting a reaction from the person/team that 'creates' the NFCV tasks...
[Jan 16, 2023 7:11:43 AM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1317
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Very Wide Points Spread on Linux?

I can only dream of getting a reaction from the person/team that 'creates' the NFCV tasks...
Nowadays, that would be the same folks as run WCG :-)

For what it's worth, I checked back to the last BETA tests done for MCM1[1] on WCG/IBM as I had a memory of them being faster... However, it appears that they did two sets of tests, the first using NFCV and the second using LOO (but with a different, more complex, set of control parameters). The NFCV tasks showed a minimal improvement in performance on my Ryzen and were, if anything, slightly slower on the Intel; the LOO tasks were quicker on both platforms, but how much of that was down to re-compilation and how much to the re-jigged parameters is unknown.

So it does look like a code-path issue, and whether some changes might be possible remains to be seen -- for now they're probably far more invested in finishing the post-transfer "snagging" (especially the bits regarding communication between the two databases, which seem to be at the core of most outstanding issues not related to work accessibility.)

Cheers - Al

[1] The BETA ended without comment as to whether anything useful was learnt, and nothing appeared to change regarding production MCM1 work......
[Jan 16, 2023 3:35:24 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7846
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Very Wide Points Spread on Linux?

Not that I want to get too involved in a highly technical explanation of the differences between the LOO and NFCV units, but it would be nice to at least hear a layman's explanation for the differences and what is being learned from each variation. Just because of the disparities in the times I know they are doing something different during their calculations. Just what that might be would be interesting to know.
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Jan 16, 2023 4:18:47 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 20   Pages: 2   [ Previous Page | 1 2 ]
[ Jump to Last Post ]
Post new Thread