| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 15
|
|
| Author |
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Ok, I tested a few more things:
- resetting the project didn't help - removing and re-installing Boinc didn't help What I did find out is, that in the repository of Ubuntu 18.04 there is BOINC 7.9.3, while Uuntu 20.04 comes with BOINC 7.16.6. On the Boinc download page 7.16.6 for Linux is marked as development version and "MAY BE UNSTABLE - USE ONLY FOR TESTING". Maybe there is a reason for this and so my latest theory is, that Boinc 7.16.6 is to blame for the problem. As I wasn't able to get an older download version of Boinc from the official site to run on my machine and do not know how else to install an older Boinc version as a Linux noob, I simply installed Ubuntu 16.04 again on my cruncher and everything works fine. Hope this information helps somebody in the future. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
This issue is related to the app_config.xml file. I had a limit on the number of ARP tasks that could be run concurrently. For some unknown reason, the client looked at the app_config and saw a limit of 15 then went on to "assume" I had 9 unused CPUs. When in reality there were 9 OPN1 WUs running. The scheduler then downloaded enough work to make up for the 9 idle CPUs. Which was way too much. Without changing the app_config.xml file, the client corrected itself when the last ARP1 WU ended. If you don't have an application that is limited by app_config running, you probably won't see this issue.
----------------------------------------[Edit 1 times, last edit by Former Member at Sep 21, 2020 11:23:12 PM] |
||
|
|
KerSamson
Master Cruncher Switzerland Joined: Jan 29, 2007 Post Count: 1684 Status: Offline Project Badges:
|
The setting combination between app_config.xml at one side, and the web site setting - project WU's limitation, cache size - at the other side is not really obvious.
----------------------------------------
If the Workunit Cache Settings is too small, maybe the machine will not reach the maximal number of WUs being allowed to be downloaded. If the Project Limits is smaller than the number of concurrent WUs to be computed defined in app_config.xml, the latter limitation will not be reached. If the limitation defined in app_config.xml is "tighter" than the maximal number of allowed thread, the machine will not be able to reach the maximal possible performance. It is important to notice that app_config.xml defines computation limitations only but no computation priority. The execution priority remains the same as usual: First In, First Out (FIFO) excepted if boinc considers that the reporting deadline cannot be reached. Hopefully, the explanation is clear enough. Cheers, Yves |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
This issue is related to the app_config.xml file. I had a limit on the number of ARP tasks that could be run concurrently. For some unknown reason, the client looked at the app_config and saw a limit of 15 then went on to "assume" I had 9 unused CPUs. When in reality there were 9 OPN1 WUs running. The scheduler then downloaded enough work to make up for the 9 idle CPUs. Which was way too much. Without changing the app_config.xml file, the client corrected itself when the last ARP1 WU ended. If you don't have an application that is limited by app_config running, you probably won't see this issue. First time I hear of this variant. The client gets what's specified in the device profile. So if you have 15 ARP1, and specified 24 ARP1, the server will continue trying to sent ARP1 until that number has been reached. If you have set no limit on OPN1, than each time work is requested, and no ARP1 slot is available, OPN1 will continue to be sent until the buffer limit minimum + additional days. It's the reason why I set limit on all sciences, so the buffer does not get loaded with one or the other because of a science not having anything on hand momentarily. Positing this is a bug is best reported over at the Berkeley developers where they actually have a real simulator in which you can enter the parms and run through various scenarios. Reproducibility and replication is the only way you will get attention there. [Edit 1 times, last edit by Former Member at Sep 22, 2020 10:08:42 AM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I did post this as a bug and ran into a recalcitrant Jord after which I deleted my ID and vowed never to post another bug there again. I don't really care if the issue ever gets resolved and evidently neither does Berkeley. I know it's a client bug and can reproduce at anytime. It has nothing to do with the WCG limit settings on the website. David Anderson made a change to the scheduler and he even admits that the change is not ideal but it was all he was going to do for the time being. Admittedly, if this is what is trying to be pointed out, one could set limits on the projects at the website and that would prevent the client from downloading too much work because it would hit the limits first. However, the website has a maximum of 64 which isn't ideal for those with 128 CPUs (threads). The only other option is NOLIMIT and hence the problem.
----------------------------------------I could probably spend some time going through the scheduler code and find where the problem lies like I had to do to prove to them that Selinux was causing an issue with the client recognizing Virtualbox installation. But since they have decided to take a combative attitude (I guess I called their baby ugly) they can wallow in their own muck as far as I'm concerned. Looking at the GitHub site, I think one would need to go back to a 7.14.x release (prior to David Anderson's change) to avoid the problem. [Edit 1 times, last edit by Former Member at Sep 22, 2020 3:13:20 PM] |
||
|
|
|