Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 106
|
![]() |
Author |
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
nanoprobe, what's the deadline on these? September 10th And FWIW it's happening on 3 different crunchers. Now it gets even stranger. One cruncher stopped 6 Leishmaniasis WUs that had been running for a total of 4 hours and started 6 new ones. All with the Sept. 10th due date. Why this could happen depends on: What's your connect setting? What's your additional buffer setting? What's the Switch between apps time setting? These 3 work together and can push the panic state envelope. with an initial estimated run time uncertainty for DSFL, totals could have inflated. Certainly the first 6 received had 4 days deadline. The later had 10 days, and all of these came with 8:20 hours TTC. They completed in ~4:30 and after a half dozen the ones waiting in cache still show 6:42 TTC. Everything runs orderly, 2 day cache (to force a fetch at all, normally 1 day), connect 0.0 days. --//-- |
||
|
KWSN - A Shrubbery
Master Cruncher Joined: Jan 8, 2006 Post Count: 1585 Status: Offline |
That type of behavior happens to me all the time. BOINC is supposed to complete the units with the earliest deadline when it goes into panic mode, but it does a really, really bad job of determining which work units to crunch.
----------------------------------------Typically, this will occur after a particularly long running result. This pushes the expected run time of everything in the queue much higher triggering high-priority running. From this point it is speculation on my part, but it appears that BOINC calculates the number of results that it expects to be able to complete before the deadline (based on estimated run times) and jumps ahead of everything that it assumes will not meet the deadline. This results in it abandoning its currently running units and jumping to a seemingly random point in the queue and crunching from that point. Once the estimated run times get back to normal, it will pick up the units in progress and start back at the top of the queue. Not much that you can do about it, just wait for things to sort out and they will over time. As Sekerob said, new projects are more likely to have inaccurate run time estimates. ![]() Distributed computing volunteer since September 27, 2000 |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
There's an amount of "try" and then it does it with the jobs that are at the end of the queue (and to the viewer seemingly randomly when the printed times of those are same). If it happens all the time, than the answer is in combined settings of the 3 items listed and the inflation effect, mostly in play when having high variability (HCMD2 e.g.). Drives the rDCF up the wall.
Mind you, I've tried to force the issue a few times under 6.12, and it was not easy to get it into panic. The scheduling logic seems to be improved quite a bit, for those with larger settings. --//-- |
||
|
nanoprobe
Master Cruncher Classified Joined: Aug 29, 2008 Post Count: 2998 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Hi nanoprobe Mine have jumped the queue as well They are working in the same way as the C4CW's I had HFCC's ticking along nicely with earlier due dates All now waiting to run Still great new project ![]() I've never had this issue before. DSFL seems to completely ignore the "switch between applications every xxxx minutes" settings by at least 4 days. It's only happening on my Intel crunchers. I've had to manually suspend numerous DSFL WUs to let other projects finish WUs that were stopped with only minutes left to completion. ![]() ![]()
In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.
![]() ![]() |
||
|
BladeD
Ace Cruncher USA Joined: Nov 17, 2004 Post Count: 28976 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Welcome to WCG, Carlos and the Drug Search for Leishmaniasis (DSFL) project!
----------------------------------------I don't remember welcoming a new project to WCG on it's FIRST day! ![]() ---------------------------------------- [Edit 1 times, last edit by BladeD at Sep 1, 2011 10:52:26 AM] |
||
|
coolstream
Senior Cruncher SCOTLAND Joined: Nov 8, 2005 Post Count: 475 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I have just received two resends of WUs that had previously errored out. They are
----------------------------------------https://secure.worldcommunitygrid.org/ms/devi...s.do?workunitId=329165940 and https://secure.worldcommunitygrid.org/ms/devi...s.do?workunitId=329165914 Both had the same exit message (which I have never seen before) <core_client_version>6.10.17</core_client_version> <![CDATA[ <message> too many exit(0)s </message> ]]> ![]() Crunching in memory of my Mum PEGGY, cousin ROPPA and Aunt AUDREY. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Regrettably, no members except those in the actual quorum can see the details.
Generally that "too many exit(0)s is an system instability error... too busy and regressing to the previous checkpoint too many times. 6.10.17 hints, if not mistaken, at Linux? It's never been an ultra-stable version. By Launchpad listing of BOINC packages, from Ubuntu Lucid (10.04), it's at least 6.10.58 as recommended. Summary: Don't think you'll have trouble with these. --//-- |
||
|
coolstream
Senior Cruncher SCOTLAND Joined: Nov 8, 2005 Post Count: 475 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Sorry about the links. I keep forgetting that they are https.
----------------------------------------It would be very strange if these had been processed on a Linux box because they have been sent to my Win7 64-bit machine. That's unless the 'same platform' rule is being ignored. ![]() Crunching in memory of my Mum PEGGY, cousin ROPPA and Aunt AUDREY. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
:D
Ah, all the more reason for the wingman to go get an upgrade, as that version was short lived on Windows. It's just that 6.10.17 seems to be a frequent wingman on my Linux box, so [mistake as always], guessed it to be a Linux rig. Knowing the platform etc helps us to not stray into that territory ;>) --//-- |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
P.S. The "homogeneous redundancy" is only not applied to Human Proteome Folding 2 AFAIK. Those "work was available, but only for other platforms" gave that away at launch time. Had it yesterday for a little until uplinger said he'd upped the weight.
--//-- |
||
|
|
![]() |