| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 70
|
|
| Author |
|
|
TonyEllis
Senior Cruncher Australia Joined: Jul 9, 2008 Post Count: 286 Status: Offline Project Badges:
|
Received 2 - 1 each on 2 machines
----------------------------------------1) after 6 hrs 41 mins at 37.15% complete 2 days 20 hrs 32 mins to completion Update - Error - cpu time limit has been exceeded 2) after 9 hrs 45 mins at 54.18 % compete 1 day 20 hrs 21 mins to completion Update - Error - cpu time limit has been exceeded In another batch snagged 3 more 3) after 1 hr 50 mins at 10.23% complete 4 days 22 hrs 56 mins to completion :-) Update - needed to restart the machine Went from 46.98% complete to 0.00% after reboot Update - 18 hrs Killing job because cpu time limit has been exceeded, but Valid :-) Wingman completed in 7.55 hrs 4) after 3 hrs 33 mins at 19.80% complete 8 hrs 21 mins to completion Update - 18 hrs Killing job because cpu time limit has been exceeded, but Valid :-) Wingman completed in 16.14 hrs 5) not yet started. Update - Completed and Valid 5.56 hrs The time limit should not be hard coded, but variable depending on the CPU speed... Allow slower processors more time No checkpoint after 46% complete :-(
Run Time Stats https://grassmere-productions.no-ip.biz/
----------------------------------------[Edit 8 times, last edit by TonyEllis at May 26, 2016 10:23:55 AM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Interesting as in past thought that <fraction_done_exact/> did not work on the CEP2 production work, but now it clearly does... It depends on your definition of "working"! One of mine that was estimated at over 18 hours with fraction_done_exact has just completed in 2.25 hours - but exited with RC = 0x1 in Job #0; Valid already. |
||
|
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
The definition for CEP2 is, that the progress percent is buggered [variable, unpredictable job durations, therefor you wont get it to ever give a correct, but 4:04 days or "and all is much better... 9 to 12.5 hours"
Does that clarify your understanding of mine ;? |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
You're clearly an optimist
![]() |
||
|
|
hiimebm
Senior Cruncher United States Joined: Oct 19, 2014 Post Count: 305 Status: Offline Project Badges:
|
These things DO NOT checkpoint. Turned on my laptop just now and the Wu reset to 0 from 5.5%. :(
----------------------------------------![]() |
||
|
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
Turning off when CEP2 is running is not the smartest action, if not before checking whether a checkpoint was written [there are only 5 in the last production batches, used to be 8]. The experienced hibernate/sleep their devices, so you wont loose progress.
|
||
|
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
Of the 5 on the laptop, the first has checkpointed -after- 11:18 hours... all are now ~13:30 hours, i.e. the other 4 are going for bust is nothing happens in the next 4.5 hours.
Got 3 more brand new on the desktop just 1 hour ago... running high priority all by themselves... what to expect if they arrive with 4 days runtime projection :P |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Huh, that's weird. I got one that was projected 36 hours, but the percentage suggested it would actually complete in 18. But it completed in 1.6 hours instead.
Looks like the bulk of the work was skipped. Did I just happen to get a completely unfeasible dataset that made the application decide to abort (smartly avoiding time wasting), or is this unexpected? ------- BETA_ E236438_ 146_ S.396.C44H22N2S6.CGPWUKAIJXKZGD-UHFFFAOYSA-N.14_ s1_ 14a_ 0-- <core_client_version>7.6.9</core_client_version> <![CDATA[ <stderr_txt> INFO: No state to restore. Start from the beginning. [20:50:21] Number of jobs = 5 [20:50:21] Starting job 0,CPU time has been restored to 0.000000. Application exited with RC = 0x1 [22:31:48] Finished Job #0 [22:31:48] Starting job 1,CPU time has been restored to 5986.070372. [22:31:48] Skipping Job #1 [22:31:48] Starting job 2,CPU time has been restored to 5986.070372. [22:31:48] Skipping Job #2 [22:31:48] Starting job 3,CPU time has been restored to 5986.070372. [22:31:48] Skipping Job #3 [22:31:48] Starting job 4,CPU time has been restored to 5986.070372. [22:31:48] Skipping Job #4 22:31:49 (620): called boinc_finish </stderr_txt> ]]> |
||
|
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
Yes.
|
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I got 1 WU on my IMac this morning. I stated at 5:40CST. First checkpoint was at 10:50CST. Job ended without error after job 3 at 11:30CST. (I was out when it checkpointed, so could not do a suspend and restart.)
|
||
|
|
|