| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 8
|
|
| Author |
|
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7846 Status: Offline Project Badges:
|
I believe this is the first error I have encountered on this project and it tells me the maximum time allowed was exceeded, yet the other two units completed were even longer.
----------------------------------------Result Name App Version Number Status Sent Time Time Due / Return Time CPU Time / Elapsed Time (hours) Claimed/ Granted BOINC Credit OET1_ 0000331_ xEBGP-OM_ rig_ 0773_ 3-- - In Progress 2/15/15 12:02:30 2/19/15 00:02:30 0.00 0.0 / 0.0 OET1_ 0000331_ xEBGP-OM_ rig_ 0773_ 2-- 719 Error 2/15/15 03:08:30 2/15/15 12:02:25 4.41 51.6 / 0.0 <= Mine OET1_ 0000331_ xEBGP-OM_ rig_ 0773_ 1-- 719 Pending Verification 2/11/15 13:08:55 2/15/15 03:08:22 16.88 199.1 / 0.0 OET1_ 0000331_ xEBGP-OM_ rig_ 0773_ 0-- 719 Pending Verification 2/11/15 13:08:49 2/12/15 19:36:25 7.90 199.6 / 0.0 Here is the result file: Result Log Result Name: OET1_ 0000331_ xEBGP-OM_ rig_ 0773_ 2-- <core_client_version>7.0.27</core_client_version> <![CDATA[ <message> Maximum elapsed time exceeded </message> <stderr_txt> INFO: No state to restore. Start from the beginning. [01:23:05] Number of tasks = 1 [01:23:05] Running task 0,CPU time at start of task 0 was 0.000000 [01:23:05] ./ZINC00057319_1.pdbqt size = 28 10 ../../projects/www.worldcommunitygrid.org/oet1.xEBGP-OM_rig.pdbqt size = 2391 0 </stderr_txt> Result Log Result Name: OET1_ 0000331_ xEBGP-OM_ rig_ 0773_ 1-- <core_client_version>7.0.27</core_client_version> <![CDATA[ <stderr_txt> INFO: No state to restore. Start from the beginning. [03:30:43] Number of tasks = 1 [03:30:43] Running task 0,CPU time at start of task 0 was 0.000000 [03:30:43] ./ZINC00057319_1.pdbqt size = 28 10 ../../projects/www.worldcommunitygrid.org/oet1.xEBGP-OM_rig.pdbqt size = 2391 0 [21:09:22] Finished task #0 cpu time used 60750.332658 21:09:22 (21897): called boinc_finish </stderr_txt> Result Log Result Name: OET1_ 0000331_ xEBGP-OM_ rig_ 0773_ 0-- <core_client_version>7.2.42</core_client_version> <![CDATA[ <stderr_txt> INFO: No state to restore. Start from the beginning. [10:46:56] Number of tasks = 1 [10:46:56] Running task 0,CPU time at start of task 0 was 0.000000 [10:46:56] ./ZINC00057319_1.pdbqt size = 28 10 ../../projects/www.worldcommunitygrid.org/oet1.xEBGP-OM_rig.pdbqt size = 2391 0 [18:46:31] Finished task #0 cpu time used 28452.830862 18:46:31 (21589): called boinc_finish </stderr_txt> The cpu tome/elapsed time in my results staus is listed as 4.41/4.58. System is Core2Duo at 2.66ghz Linux Mint Cheers
Sgt. Joe
----------------------------------------*Minnesota Crunchers* [Edit 1 times, last edit by Sgt.Joe at Feb 15, 2015 1:29:37 PM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
If the device benchmark is optimistic compared to what the actual task is being processed for [integer/float/both], you can end up with being allowed less time than wingman.
Linux? I've always had a question mark about the 2-3-4 fold Dhrystone integer benchmark compared to Windows, where the Whetstone float tests are mostly closely aligned. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Current restriction on a 462 task being queued is, where the bound value is 40 times the current average estimate [all platforms].
<rsc_fpops_est>20063117540850.000000</rsc_fpops_est> <rsc_fpops_bound>802524701634000.000000</rsc_fpops_bound> The higher the benchmark is, the shorter the max exceed time becomes. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Old knreed post of what could be done, if you fear the time allowed is not enough:
http://www.worldcommunitygrid.org/forums/wcg/viewpostinthread?post=178898 Of course re-benchmarking might bring an issue to light, such as the test having been done e.g. at 3.6GHz and the device running at 2.4Ghz. Cuts the allowed time 33%. One note, some output files do grow too over time. There's an upload file size restriction as well, just in case. |
||
|
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7846 Status: Offline Project Badges:
|
Current restriction on a 462 task being queued is, where the bound value is 40 times the current average estimate [all platforms]. OK, thanks. This would make sense since I had a whole slug (more than 150) of real shorties before this one. The current average estimate must have been less than the 10 minute range, if my math is right it would have been about 7 minutes. Unfortunate, but the third wingman should take care of the validation process. Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
What I'd like to emphasize, tried this before, is that the client v7 does not do any runtime adaptation, does not matter if it does 150 or 10. The projection entirely follows the server average stored in <rsc_fpops_est> and the pretty static benchmark values, with whatever lapse factor between generation, distribution and return.
With short work waves, any longer new work is quick to get an adapted header FPOPS, converse after a long work wave, the adaptation in the header is slow to respond, so short work will arrive at first with long work estimates. Another main snag in this story is, how much work is outstanding amongst the total pool of clients and the depth of cache within the buffers of each host [which then has the credit story impact at individual host level at different moments in the evolution]. As uplinger noted, he's uppeted the per-core IP to 35 from 25, as when there's a period of shorts, then flipping to long, an unlimited caching could lead to enormous overbuffering, the HCMD2 experience revisited. Also as DCF is locked to 1.000000 the cache variation is less aggressively responding, it follows the server runtime means, where with a functioning DCF of client v6 and earlier, the runtime reduction was slow to cause a response of the DCF and with runtime increase the DCF would react very quickly [over buffer protection by design]. Do you want the active DCF, downgrade to v6, but then it is even more strongly advisable to not have a multi-day buffer. Does not change the risk of max time exceed. Is the benchmark too optimistic, the host is with variable runtimes more at risk. With an active DCF that would have been countered quickly, which with v7 it much less is. Sorry if causing , no it is not simple at all. |
||
|
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7846 Status: Offline Project Badges:
|
It is an interesting discussion. There is no perfect feedback mechanism due to randomness introduced into the system by an unknown number of variables. If a person was well versed in chaos theory they might be able to buffer the feedback loops sufficiently to mitigate the swings. That would be a way deeper understanding than I possess. Suffice it to say the situation I encountered is most probably quite uncommon.
----------------------------------------Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Undersigned is actually quite good at the chaos theory, the practice of it
![]() |
||
|
|
|