Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 14
Posts: 14   Pages: 2   [ Previous Page | 1 2 ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 2191 times and has 13 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Tasks Apparently Taking Too Long- Are They Corrupted Somehow?

Hello WCG.

I had "my466_00009_11" (a HumanProteomeFoldingPhase2 project) WU stuck at 5.909% progress after running for 02:58:14 (as indicated by BOINC_v6.2.28). I opted to abort the WU when the BOINC-indicated crunchTime for the WU reached around 03:00:00. I also had another HPF2 WU that ran for (BOINC-indicated) 30hrs with the progress stuck at around 80+% for about 2hrs clockTime. I ended up aborting the unit; there goes wasted processing cycles. I had no such long-duration BOINC-indicated crunchTimes with other WCG project's WUs. My rule of thumb is to abort WUs whose BOINC-indicated progress-meter does not tick or otherwise shows a frozen progress. I'm at a lost as to what it is with these HPF2 WUs that make them exhibit frozen progress and/or very-long crunchTimes.

Good day, everyone
[Dec 26, 2009 8:14:13 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Tasks Apparently Taking Too Long- Are They Corrupted Somehow?

Hello WCG.

I had "my466_00009_11" (a HumanProteomeFoldingPhase2 project) WU stuck at 5.909% progress after running for 02:58:14 (as indicated by BOINC_v6.2.28). I opted to abort the WU when the BOINC-indicated crunchTime for the WU reached around 03:00:00. I also had another HPF2 WU that ran for (BOINC-indicated) 30hrs with the progress stuck at around 80+% for about 2hrs clockTime. I ended up aborting the unit; there goes wasted processing cycles. I had no such long-duration BOINC-indicated crunchTimes with other WCG project's WUs. My rule of thumb is to abort WUs whose BOINC-indicated progress-meter does not tick or otherwise shows a frozen progress. I'm at a lost as to what it is with these HPF2 WUs that make them exhibit frozen progress and/or very-long crunchTimes.

Good day, everyone

Long-standing known and unresolved issue (cause is still unknown). Suspend and then resume the WU again may workaround the problem.
[Dec 27, 2009 6:26:22 AM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Tasks Apparently Taking Too Long- Are They Corrupted Somehow?

Generally you should not need to do this, but on hpf2 if you have the setting "remove from memory when suspended" this method should work. The reason it is happening is your work unit has encountered a non convergence in the code. These issues are hard to track down because of the random non determinable functions that are used within HPF2. The restart of the application generally causes the functions to take a different route since a new random number is generated for each step.

But generally if you notice something strange about a work unit, please post about it in the forums and what you are seeing, like percentage increasing, boinc version, message logs if you can, etc...Generally the other members will respond like they have in this thread giving you tips and pointers to what they have seen as well.

Thanks,
-uplinger
[Dec 28, 2009 3:26:51 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Tasks Apparently Taking Too Long- Are They Corrupted Somehow?

Hello WCG.

Attention: uplinger
Re: your post: Dec 28, 2009 3:26:51 PM

Sir:

May I suggest that some kind of limit be clamped on the amount of non-convergence a WU is allowed to make before the WU is deemed 'closed' for the current session (that is, the WU is deemed completed for the current cruncher's machine, but needs to go through another round(s) of crunching after the WU is submitted back to WCG.)

Although there may be random non-determinable functions involved, I assume that the amount of non-covergence is concrete and quantifiable and if so, I suggest that a time limit, say, a 12hr-limit be imposed on the non-convergence; that is, a WU exhibiting symptoms of non-covergence has up to 12-hrs to at least equal (if not better) the latest point of least non-convergence. This is kind of like getting lost in a forest ('non-covergence' to the way out of the forest), but we stop looking for the lastPoint (known to lead to a way out of the forest; unless we find another exit point, in which case, this would be the reference lastPoint to do the reckoning) after 12-hours of searching for a way out of the forest.

Thanks, Merry Christmas, and a Happy New Year to everyone!
[Dec 28, 2009 6:34:05 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 14   Pages: 2   [ Previous Page | 1 2 ]
[ Jump to Last Post ]
Post new Thread