| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 15
|
|
| Author |
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Just an observation which might be a candidate for consideration for a subsequent update to the client. (Or maybe someone can tell me a better way ...)
I have max_concurrent for ARP1 set to 6 in order to limit memory use, and that has been respected -- except that today I got back to find something I didn't expect. The first thing I saw was that two ARP1 WUs were 'waiting to run'. Then I realised that five others were running. That meant that SEVEN lots of memory had been allocated for these WUs -- not leaving much for the other things I'd like to run from time to time. I then realised that two MCM1 WUs were 'waiting to run' and two others were executing. One of those had an early deadline, so had obviously jumped the queue. Fair enough, and not necessarily unexpected that an ARP1 WU might be suspended to accommodate it, and there may have been others while I was out, but why is it that, at some point, a 'ready to start' ARP1 WU was started instead of a 'waiting to run' one simply restarting from where it had got to? That decision by the scheduler seems perverse, and is causing me to be rather shorter on memory that I'd like to be. (As it happens I run with no swap file or partition.) Can anyone shed any light on this? I have the Ubuntu supplied package which has the manager at 7.6.31. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
If you can reproduce this in 7.14. or 7.16. you might find someone to talk to, until then, Gee Willy, the Debian package maintainer for Ubuntu is at least up to 7.16.3
Just add the ppa to get access. https://launchpad.net/~costamagnagianfranco/+archive/ubuntu/boinc |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi lavaflow. Thanks for your response.
The WCG position for Linux is to go with the distro supplied BOINC release, so I'm loathe to update it. However, your response implies that the behaviour that I reported is indeed unexpected, which is a relief. I suspect also that, even trying deliberately, as an end-user I am most unlikely to be able to reproduce this. If someone who has the right test environment and who knows how to tweak things, presumably at the server end, to drop in a high priority WU to see what happens then I'd love them to try, but for a once-in-a-blue-moon situation like this I'm not going to cry over a possible bug. Like I said, thanks for the response. It seems I'm not going gaga just yet! |
||
|
|
VietOZ
Senior Cruncher United States Joined: Apr 8, 2007 Post Count: 205 Status: Offline Project Badges:
|
i'm not an end user by any means. But "waiting to run" seems like WUs just switching from one to another (higher priority). "Waiting for memory" is when you'll be concern. Have you check to see if the memory usage went overboard?
----------------------------------------I've seen this behavior a lot on both win and linux. Some similar to your case, some by reducing the concurrent tasks in app_config. But eventually, the WUs will work themselves out. Doesn't matter which version of boinc. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi lavaflow. Thanks for your response. The WCG position for Linux is to go with the distro supplied BOINC release, so I'm loathe to update it. However, your response implies that the behaviour that I reported is indeed unexpected, which is a relief. I suspect also that, even trying deliberately, as an end-user I am most unlikely to be able to reproduce this. If someone who has the right test environment and who knows how to tweak things, presumably at the server end, to drop in a high priority WU to see what happens then I'd love them to try, but for a once-in-a-blue-moon situation like this I'm not going to cry over a possible bug. Like I said, thanks for the response. It seems I'm not going gaga just yet! The BOINC developers made an extensive simulator available where scenarios can be set up to see how the scheduler in it's latest incarnation would handle this. A new ARP being started when there is one in 'waiting to run' should not happen. As for waiting on WCG or Berkeley to come with an official release, no, wont happen. They've handed off to the package managers for the respective Linux distros. Go to the WCG download page for Linux and all you get is 'how to download for your distro' https://www.worldcommunitygrid.org/ms/viewDownloadAgain.action# Costamagna is the 'go to' Debian package maintenance guy whom you'll find interacting with Mr Anderson on the alpha tester's mailing list, well respected. [Edit 1 times, last edit by Former Member at Dec 22, 2019 10:16:20 PM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Go to the WCG download page for Linux and all you get is 'how to download for your distro' That's exactly what the techs said to do when I asked -- for Windows get the approved one, for Linux get whatever the distro supports. And, even though your comments are very positive, WCG itself has to maintain a position in terms of what they're going to support by testing, e.g. in their alpha environment. But you're very persuasive. I'll do some digging and might try to upgrade it anyway. This is a situation I can live with for 24 hours (Firefox is a hog and restarting that always frees quite a bit of memory) but if it happens again I'll certainly upgrade in the hope of stopping it ever happening again. Thank-you for the information. Much appreciated. |
||
|
|
TonyEllis
Senior Cruncher Australia Joined: Jul 9, 2008 Post Count: 286 Status: Offline Project Badges:
|
Apis Tintinnambulator wrote:
----------------------------------------(As it happens I run with no swap file or partition.) May I ask why?
Run Time Stats https://grassmere-productions.no-ip.biz/
|
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi Tony,
Writing that made me think if that was still the right thing to do. I always used to do it because it gave higher performance, provided you had enough real RAM not to need to worry. But I'm wondering if, with the advent of SSDs, and with the memory demands of ARP, whether I should rethink that (or, at least, do some tests to find out). But old habits die hard, and I don't run VMs or other memory eaters so it's never been an issue until today. Putting more RAM in my lappie is probably too much hassle too, but I'll need a new desk-side machine soon (dream on) and I'll certainly make sure that that has plenty of RAM when I configure it. This is off-topic, otherwise I'd ask other crunchers for comments. But maybe someone would like to start a thread to share views and experience about this, perhaps in the hardware forum? |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Oh foo -- I AM going gaga!
I just noticed in my results list that one of the ARP1 WUs from yesterday was a _2 resend. I should think that that explains the scheduling behaviour. However, for someone who wants to limit memory usage that's not very helpful. I guess I'll have to think about other BOINC parameters ... |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
That foo would almost definitely do this, but really is also a function of the buffer depth. IIRC, uplinger increased the repair return time to 50%, i.e. 3.5 days. That would cause it to be instantly prioritized if the buffer is near 1.75 days (so happens priority is also kicked by a 50% of deadline buffer rule). The deadline of the 'waiting to run' task will likely have aged a bit already but not enough. At 7 days and assumed 1.75 days buffer, it would be 7 - 1.75 - runtime already spend, simplified ballpark.
----------------------------------------Anyway, smells like all is well, except for the gaga part, and you can stick to 7.6.3 (I wouldn't) [Edit 2 times, last edit by Former Member at Dec 23, 2019 12:30:17 PM] |
||
|
|
|