Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Completed Research Forum: OpenZika Thread: all wus exceeding elapsed time limit |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 19
|
Author |
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
In Chris's example above, the Result Log included an Unhandled Exception; is that normal after time limit exceeded, or is it an indication of some other problem? The final task log entry is at [17:30:02], whereas the Dump Timestamp is 05/20/16 08:38:39. If those are both local time, then it looks strange to me.
|
||
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
Good point... running as a zombie **, then when it reaches the internal exceed time point it self destructs. A hard look at security software logs may be in order, system temps and some memtest86 diagnostics running too.
----------------------------------------** The zombie part is dubious... tasks processes have to have a 'keep alive' connection with BOINC client at least 1 time every 30 seconds or they reset (heartbeat loss, zero status resets). [ot], grats to your reentry into the Societé Club Max... I'm not expecting to get the renewal card until later next week, on my crunching account... hunting for other objectives and taking whatever comes by in the mix[/ot] [Edit 1 times, last edit by SekeRob* at May 21, 2016 1:51:44 PM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
The same error was reported in beta as well, on a slow Intel Atom X5-z8500, but there the dump timestamp is a few tens of minutes after the last log entry.
|
||
|
i007008
Cruncher Joined: Sep 16, 2005 Post Count: 21 Status: Offline Project Badges: |
Just to add a bit more detail to my last post. With a computing preference set to store 0.7 days work, I received some 80 Zika WUs with an estimated completion time of 19 mins 53 secs each. So multiplied by 40 (from SekeRob* post), that would be approximately 13 hours 20 mins for completion. As I said earlier most WUs complete in about 8 hours which is fine, and at present, for the 4 WUs currently running, the Elapsed time plus Remaining time for those WUs is just less than 8 hours. However I have had 2 WUs which, with somewhere between 10 and 13 hours elapsed time, showed a “Remaining time” of 2 – 3 days, and these are the ones which errored out.
FYI, the laptop is not throttled so it's running at full speed, but occasionally WUs are suspended for a minute or two when I am doing other tasks. However, the laptop runs 24 * 7, so there should be enough time over night, when nothing else is running, for the current Zika WUs to finish. Anyway when I have completed the 80 WUs and receive some with a more realistic estimated completion time, the problem should go away. Either that, or perhaps there are some WUs which will never complete due to looping or another problem. and these will still error out, even though the completion time is more realistic? Regards Chris |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
... perhaps there are some WUs which will never complete due to looping or another problem. and these will still error out, even though the completion time is more realistic Thanks for the extra detail, Chris. If you have the time, checking those system temps and memtest86 diagnostics might be worthwhile, as recommended by SekeRob, in case your "perhaps" is correct - suggested by the observation that most of your Zika workunits run successfully. |
||
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
Smells like some on-device research might bring things to light, or, it is a particular compound type where things fall over... to name another possibility. Whilst we know there is a variable number of jobs in a task [a task with tasks in them], to us outside mortals there's no telling what compounds are being run against the [what i presume] is the same ligand for all tasks in a task [why couldn't these subs have been called jobs or some other label instead?]
|
||
|
i007008
Cruncher Joined: Sep 16, 2005 Post Count: 21 Status: Offline Project Badges: |
That's useful Tony, thanks. I'll have a look later on, when I have more time.
|
||
|
9maMSSuNWXgttyKdZhMemeXmEx8
Senior Cruncher Puerto Rico Joined: Feb 20, 2008 Post Count: 191 Status: Offline Project Badges: |
The same error was reported in beta as well, on a slow Intel Atom X5-z8500, but there the dump timestamp is a few tens of minutes after the last log entry. Yes, that was me. As a temporary solution I stopped the PC from getting GPU tasks from other projects, as that throttles down the processor and that seems to be what is causing the project to result in errors. Also, I second this as my i7-4700MQ machine, when running a GPU project, it also produced 3 error workunits with the time limit exceeded error, so this seems to be related to PCs that throttles the CPU due to thermal issues, like running GPU projects. After disabling the GPU project none of the machines has produced errors. By the way, I have plenty of Atom machines: 3 Atom Z3735G, 1 Atom X5-Z8500 and yesterday I got a machine with 1 Atom X5-Z8300 which is performing great with this project, taking 13 hours approximately per workunit. |
||
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
The seconds allowed time calculation is based on the benchmark, not the momentary cycles, i.e. if you do this manually, also re-benchmark!
----------------------------------------Not sure about your calculation and what 10 is. If that's the hours 'originally' projected, on re-benchmark you'd have been allowed (3.7/2.5)*10 = ~14.8 hours. Edit: But am I responding to a spammer? Why the weblink in the main post body?... and yes indeed, the post is a full snip from a May 21 post up in this thread. Abort Abort. [Edit 1 times, last edit by SekeRob* at Jun 20, 2016 10:39:34 AM] |
||
|
|