| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 14
|
|
| Author |
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hello WCG.
I had "my466_00009_11" (a HumanProteomeFoldingPhase2 project) WU stuck at 5.909% progress after running for 02:58:14 (as indicated by BOINC_v6.2.28). I opted to abort the WU when the BOINC-indicated crunchTime for the WU reached around 03:00:00. I also had another HPF2 WU that ran for (BOINC-indicated) 30hrs with the progress stuck at around 80+% for about 2hrs clockTime. I ended up aborting the unit; there goes wasted processing cycles. I had no such long-duration BOINC-indicated crunchTimes with other WCG project's WUs. My rule of thumb is to abort WUs whose BOINC-indicated progress-meter does not tick or otherwise shows a frozen progress. I'm at a lost as to what it is with these HPF2 WUs that make them exhibit frozen progress and/or very-long crunchTimes. Good day, everyone |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hello WCG. I had "my466_00009_11" (a HumanProteomeFoldingPhase2 project) WU stuck at 5.909% progress after running for 02:58:14 (as indicated by BOINC_v6.2.28). I opted to abort the WU when the BOINC-indicated crunchTime for the WU reached around 03:00:00. I also had another HPF2 WU that ran for (BOINC-indicated) 30hrs with the progress stuck at around 80+% for about 2hrs clockTime. I ended up aborting the unit; there goes wasted processing cycles. I had no such long-duration BOINC-indicated crunchTimes with other WCG project's WUs. My rule of thumb is to abort WUs whose BOINC-indicated progress-meter does not tick or otherwise shows a frozen progress. I'm at a lost as to what it is with these HPF2 WUs that make them exhibit frozen progress and/or very-long crunchTimes. Good day, everyone Long-standing known and unresolved issue (cause is still unknown). Suspend and then resume the WU again may workaround the problem. |
||
|
|
uplinger
Former World Community Grid Tech Joined: May 23, 2005 Post Count: 3952 Status: Offline Project Badges:
|
Generally you should not need to do this, but on hpf2 if you have the setting "remove from memory when suspended" this method should work. The reason it is happening is your work unit has encountered a non convergence in the code. These issues are hard to track down because of the random non determinable functions that are used within HPF2. The restart of the application generally causes the functions to take a different route since a new random number is generated for each step.
But generally if you notice something strange about a work unit, please post about it in the forums and what you are seeing, like percentage increasing, boinc version, message logs if you can, etc...Generally the other members will respond like they have in this thread giving you tips and pointers to what they have seen as well. Thanks, -uplinger |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hello WCG.
Attention: uplinger Re: your post: Dec 28, 2009 3:26:51 PM Sir: May I suggest that some kind of limit be clamped on the amount of non-convergence a WU is allowed to make before the WU is deemed 'closed' for the current session (that is, the WU is deemed completed for the current cruncher's machine, but needs to go through another round(s) of crunching after the WU is submitted back to WCG.) Although there may be random non-determinable functions involved, I assume that the amount of non-covergence is concrete and quantifiable and if so, I suggest that a time limit, say, a 12hr-limit be imposed on the non-convergence; that is, a WU exhibiting symptoms of non-covergence has up to 12-hrs to at least equal (if not better) the latest point of least non-convergence. This is kind of like getting lost in a forest ('non-covergence' to the way out of the forest), but we stop looking for the lastPoint (known to lead to a way out of the forest; unless we find another exit point, in which case, this would be the reference lastPoint to do the reckoning) after 12-hours of searching for a way out of the forest. Thanks, Merry Christmas, and a Happy New Year to everyone! |
||
|
|
|