| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 34
|
|
| Author |
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Think the client's performance starts degrading too given the manager and core client interact once per second or so.
|
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
The 1000 limit should be raised to at least 5000 or maybe 10000 but not removed altogether. IIRC the reason it is there is to prevent a problem machine from executing 1000s of work units in a short period and the WUs all have errors. It is sort of a fail safe mechanism. However, with the new AMD 7702 processor with 64 cores and 128 threads, it seems the 1000 is quite restrictive. If one was to put the 7702 in a dual socket system, that would be 256 threads in one machine. I'm running the 1st gen AMD ZEN (32 cores per processor) in a dual socket machine and I can't run SCC1 now on that machine. That machine has 128 threads but I only get 35 WUs per scheduler request and the request happens every 2 minutes. Most of the first 35 have completed before the next scheduler request happens. Can't keep the machine busy exclusively on SCC1. I loaded it up with MIP1 (1064) WUs and it was empty in 48 hours. WUs per scheduler request, project limits, and BOINC client limits all need to be revisited.
----------------------------------------[Edit 1 times, last edit by Doneske at Sep 19, 2019 2:59:47 PM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
"that would be 256 threads"
Another hard restriction... 200 threads is the maximum BOINC supports or to be more precise, 200 job slots. Who can afford it are in multi-client concurrent install territory. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I have run more than 200 job slots on the 128 thread system. The slot count reached 236. This was due to jobs getting preempted with "leave in memory" specified. Specifically had MCM1 running when a bunch of FAH2 started. Preempted most the MCM1. Boinc didn't say a word. Why would that restriction be there? Disk space? Seems useless...
----------------------------------------Before running multiple clients concurrently, I would look into re-compiling the client to eliminate the restriction. Same with the 1000 hard limit in the client_state.h file. Been meaning to give it a try but haven't got around to it yet. [Edit 1 times, last edit by Doneske at Sep 19, 2019 5:05:34 PM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Maybe they removed the limit, but last I can find in Github is
checkin_notes_2011 |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
That looks like a query limit....
I just tested it: set ncpus to 256 started client and verified it read cc_config downloaded 256 jobs they are started fine. went to slots directory and highest directory number was 255. Stopped client and reverted cc_config back to 128 |
||
|
|
hchc
Veteran Cruncher USA Joined: Aug 15, 2006 Post Count: 865 Status: Offline Project Badges:
|
Regarding the 70 work units per core or thread limit:
----------------------------------------Looks like the # of work units per core/thread is referenced in "max_jobs_exceeded" in /sched/sched_send.cpp if (g_wreq->max_jobs_exceeded()) {and defined in /sched/sched_types.h bool max_jobs_exceeded() {and is dependent on the max_jobs_on_host_proc_type_exceeded value. Edit: Is the /sched (Scheduler) built into the client? If so, then it's not server-side. I'll have to keep looking through the code unless someone can point me in the right direction.
[Edit 5 times, last edit by hchc at Sep 20, 2019 1:59:11 AM] |
||
|
|
hchc
Veteran Cruncher USA Joined: Aug 15, 2006 Post Count: 865 Status: Offline Project Badges:
|
Regarding the 1000 total runnable jobs limit, like @Doneske said, it's in /client/client_state.h
----------------------------------------
and referenced in /client/work_fetch.cpp if (p->pwf.n_runnable_jobs > WF_MAX_RUNNABLE_JOBS) {This is hard-coded into the BOINC client, which means we can change the code for everyone. Edited to Add: I opened Issue #3295 in the BOINC GitHub.
[Edit 1 times, last edit by hchc at Sep 20, 2019 4:15:52 AM] |
||
|
|
hchc
Veteran Cruncher USA Joined: Aug 15, 2006 Post Count: 865 Status: Offline Project Badges:
|
Regarding the 200 concurrent tasks, I can't find that anywhere, but it's a problem for people with Epyc/Threadripper beasts who will have more than 200 threads going full steam.
----------------------------------------Anyone know where this is defined? Edit: Looks like @Doneske tested with 256 simulated CPUs which all ran concurrently, so maybe this is not an issue.
[Edit 2 times, last edit by hchc at Sep 20, 2019 4:14:02 AM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
The 1000 limit has been brought up a couple of times in the BOINC message boards and I think David Anderson didn't want to change it. I compiled a boinc client on a Centos 7 system when I couldn't find a distributed client that would work due to library differences or libraries missing entirely. The issue then becomes keeping it updated. Admittedly, the client probably doesn't need to be updated that much but it would once in a while and if you have a significant number of hosts, that becomes a chore. If I was more familiar with module mapping from the linker, it would be worth trying to find the constant in a binary module and just zapping it to a different value. It may be worth bringing it up again as AMD is changing the landscape with the high core count EPYC, Threadripper, and Ryzen. Intel isn't far behind. I'm just wondering if the BOINC ecosystem is becoming slightly tiered in the respect that there are still many, many systems under 32 cores but there are also a growing number of high core count systems entering the environment. Maybe there needs to be a parameter that can be entered at start up that allows the client to cater to larger thread count systems. Such as --LARGE_THREAD_COUNT that would tell the client to use larger limits both on the server and client side. By using a parameter, they wouldn't have to maintain different clients. It would be off by default. Just thinking out loud.
|
||
|
|
|