Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 6
|
![]() |
Author |
|
Mumak
Senior Cruncher Joined: Dec 7, 2012 Post Count: 477 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
As some of us already know, the current BOINC client has some problems when running Windows systems with >64 CPU threads. In such case, the available CPU threads are split into multiple CPU Groups (NUMA nodes) and it's up to the application how it assigns affinity of its child processes/threads.
----------------------------------------Currently BOINC relies on default system scheduler, which is not ideal in this case - it doesn't seem to perform effective load balancing. The result of this is that after a while most of the BOINC child processes are assigned to one CPU group, while the other group(s) remain almost idle. That means an overload/over-scheduling on one CPU and underload on others. I have already submitted this problem to BOINC forums: https://boinc.berkeley.edu/dev/forum_thread.php?id=10124 https://github.com/BOINC/boinc/issues/1357 but it seems the developers don't care about it. So I have decided to create a work-around for this case until (hopefully some day) the BOINC team will address it. I have created a tool, that checks all WCG processes running in system and spreads their NUMA Node affinity across all CPU groups. If anyone has such problems or is interested, let me know... ![]() |
||
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
"but it seems the developers don't care about it."
There is no longer a BOINC team [see publication **]. All development comes from outside/volunteers, where David Anderson's involvement seems to be to pull code-check-ins at times. To get attention, post to the BOINC alpha mail list. If you have a workaround that is easily portable into the BOINC open source code it stands chance of being incorporated [but is it a windows only problem, as the discussion seems to indicate?] ** knreed's name gets mentioned elsewhere still as being server lead, so not sure how fresh the governance document is. |
||
|
Mumak
Senior Cruncher Joined: Dec 7, 2012 Post Count: 477 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I'm aware of that change, but I have submitted this problem almost a year ago.
----------------------------------------I also understand that the developers think that the operating system should more effectively manage group affinities, but this unfortunately doesn't seem to be the case. Nevertheless, my main intention was to provide this workaround to all users experiencing this problem. ![]() |
||
|
Mumak
Senior Cruncher Joined: Dec 7, 2012 Post Count: 477 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Here's v1.2 of the tool: www.hwinfo.com/beta/NUMA_Balancer_1_2.zip
----------------------------------------By default when launched, it will check all WCG processes/threads running and if it determines some of the NUMA nodes are overloaded (while others are not), it will balance the NUMA assignment (by adjusting NUMA node affinity) for all WCG processes/threads. For best performance it's recommended to start this tool every few minutes, preferably via Task Scheduler. Starting the tool with the "-w" option will wait for a keystroke at the end, so you can see the output. There are additional options available to use it for any processes (not just WCG). Let me know if interested and I'll describe it. ![]() [Edit 1 times, last edit by Mumak at Jun 16, 2016 8:51:04 AM] |
||
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
"As some of us already know, the current BOINC client has some problems when running Windows systems with >64 CPU threads."
----------------------------------------Recent posts suggested problems started already when > 32 threads running HST1 https://secure.worldcommunitygrid.org/forums/...ead_thread,38956_offset,0 but no idea bears in any way on your Numa issue. (No, moi is not particularly interested to get any the wiser on the matter, since 8 threads is the max I can run concurrent ;). [Edit 1 times, last edit by SekeRob* at Jun 16, 2016 9:07:29 AM] |
||
|
Mumak
Senior Cruncher Joined: Dec 7, 2012 Post Count: 477 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I don't think that mentioned HST1 problem is the same as described here. The NUMA problem occurs only on systems which are using multiple CPU groups, which is always the case when there are >64 CPU threads in a system.
----------------------------------------But one can setup a multi-group system with <64 threads too, just has to do that manually since Windows doesn't do it on such systems by default. If the system is affected by the NUMA problem is easy to check - open Task Manager, switch to Performance, right-click on the graph and change to "NUMA nodes". If that entry is greyed out, it means your system has just 1 group and this doesn't apply. If the user is running at full CPU load and sees that one of the NUMA nodes is at 100%, while the other one is much lower, then the system is affected - its work threads are not optimally balanced across all CPU threads. ![]() [Edit 1 times, last edit by Mumak at Jun 16, 2016 11:25:44 AM] |
||
|
|
![]() |