Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 114
|
![]() |
Author |
|
ccandido
Senior Cruncher Joined: Jun 22, 2011 Post Count: 182 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Cant get any jobs...
----------------------------------------![]() ![]() |
||
|
Crystal Pellet
Veteran Cruncher Joined: May 21, 2008 Post Count: 1323 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
It would be novel if the task then continues with #5... did it? If not, seen jobs call it a day anywhere from #2 to #7, if there are 8 in there. Did not continue with job #5. If you look at my first post that has the result log it will show how they ended. These BETA's have 5 jobs max, so the last job is called #4. [Edit 1 times, last edit by Crystal Pellet at Feb 25, 2016 6:16:04 PM] |
||
|
nanoprobe
Master Cruncher Classified Joined: Aug 29, 2008 Post Count: 2998 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
It would be novel if the task then continues with #5... did it? If not, seen jobs call it a day anywhere from #2 to #7, if there are 8 in there. Did not continue with job #5. If you look at my first post that has the result log it will show how they ended. These BETA's have 5 jobs max, so the last job is called #4. Still would like an explanation of why these tasks skipped #4. I have other that have gone past 12 hours of runtime. Maybe Keith will chime in. ![]() Just checked 2 more that went PV a few minutes ago with 13-14+ hours of runtime. They also skipped job #4. FWIW I have 2 of the original BETAs from batch E236224. that are PV. They do have 8 jobs but #7 was skipped in both of them.
In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.
----------------------------------------![]() ![]() [Edit 5 times, last edit by nanoprobe at Feb 25, 2016 6:39:06 PM] |
||
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
OK, had not glossed over any logs [not had any and no surprise as there was an announced slow release]. Long long history with CEP2 has always been from project start skipping for many at one point, so #4 as latter considered superfluous would not alert me to anything.... log-normal.
|
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
1 WU errored out at about 45 min.
Result Log Result Name: BETA_ E236293_ 524_ S.316.C27H25N5O10Si1.PCPQZFNRAVTAOL-UHFFFAOYSA-N.1_ s1_ 14_ 2-- <core_client_version>7.6.22</core_client_version> <![CDATA[ <message> process exited with code 195 (0xc3, -61) </message> <stderr_txt> INFO: No state to restore. Start from the beginning. [17:51:50] Number of jobs = 5 [17:51:50] Starting job 0,CPU time has been restored to 0.000000. [17:51:50] Starting new Job [17:51:50] Qink name = fldman [17:51:51] Qink name = gesman [17:51:52] Qink name = scfman [18:35:03] Qink name = anlman [18:35:03] Qink name = drvman Application exited with RC = 0x100 [18:36:11] Finished Job #0 called boinc_finish </stderr_txt> ]]> IMac i5 2400s El Capitan |
||
|
kinski
Advanced Cruncher Joined: Nov 25, 2006 Post Count: 104 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I've received 3 new betas.
----------------------------------------All running without issues, currently at 4h50min/7h57min/8h07min. All survived suspend and resume without issues. BETA_ E236293_ 884_ S.314.C34F5H23N6.SVHTUBXNTHJSRG-UHFFFAOYSA-N.7_ s1_ 14_ 1-- BETA_ E236293_ 592_ S.316.C34H25N9O3.AGSMWCGAJKBULA-UHFFFAOYSA-N.6_ s1_ 14_ 1-- BETA_ E236293_ 707_ S.322.C34H31N9O3.YBWVQPFHZAFFJG-UHFFFAOYSA-N.6_ s1_ 14_ 0-- Xeon L5640 and E5649 using Ubuntu 15.10 x64 edit: checkpoint didnt survive physical restart, all started from 0 again. ![]() [Edit 1 times, last edit by [xs]anubis at Feb 26, 2016 8:39:32 AM] |
||
|
pvh513
Senior Cruncher Joined: Feb 26, 2011 Post Count: 260 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Had 2 WUs from the new batch. Did a couple of suspend/resume cycles, but things appeared to be going OK. The checkpoint file was written after the first job was finished (which is after around 60-75% of the total CPU time?). Both WUs skipped job #4. They are now in PV.
|
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Got a couple of those (>1 per thread actually) just when the other Beta test was announced. No problems after 11 hours (~50%) of running. Launched all 8 cores of them at the same time, they survived it.
Initial time left estimate said 5 hours. This client hasn't any CEP2 work done before, apart from the faulty Betas. Checkpoints are rare though - the one time I'm not running on a VM with snapshot option, i get CEP2 betas :D |
||
|
nanoprobe
Master Cruncher Classified Joined: Aug 29, 2008 Post Count: 2998 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
FWIW I found some curious result logs for tasks that are PV. Like this:
----------------------------------------Result Log Result Name: BETA_ E236293_ 749_ S.314.C35H23N7O4.DXRSJUHBMWYEGL-UHFFFAOYSA-N.10_ s1_ 14_ 1-- <core_client_version>7.4.36</core_client_version> <![CDATA[ <stderr_txt> INFO: No state to restore. Start from the beginning. [16:18:26] Number of jobs = 5 [16:18:26] Starting job 0,CPU time has been restored to 0.000000. Application exited with RC = 0x1 [17:42:46] Finished Job #0 [17:42:46] Starting job 1,CPU time has been restored to 5023.949805. [17:42:46] Skipping Job #1 [17:42:46] Starting job 2,CPU time has been restored to 5023.949805. [17:42:46] Skipping Job #2 [17:42:46] Starting job 3,CPU time has been restored to 5023.949805. [17:42:46] Skipping Job #3 [17:42:46] Starting job 4,CPU time has been restored to 5023.949805. [17:42:46] Skipping Job #4 17:42:47 (6060): called boinc_finish </stderr_txt> ]]> Skipped the last 4 jobs. And this: Result Log Result Name: BETA_ E236293_ 98_ S.314.C35F1H23N8S1.WMQVXFCVIJUQGR-UHFFFAOYSA-N.4_ s1_ 14_ 0-- <core_client_version>7.4.36</core_client_version> <![CDATA[ <stderr_txt> INFO: No state to restore. Start from the beginning. [16:19:29] Number of jobs = 5 [16:19:29] Starting job 0,CPU time has been restored to 0.000000. 19:01:39 (912): No heartbeat from core client for 30 sec - exiting No heartbeat: Exiting INFO: No state to restore. Start from the beginning. 19:03:11 (3452): No heartbeat from core client for 30 sec - exiting 19:03:12 (3452): No heartbeat from core client for 30 sec - exiting 19:03:13 (3452): No heartbeat from core client for 30 sec - exiting 19:03:14 (3452): No heartbeat from core client for 30 sec - exiting 19:03:15 (3452): No heartbeat from core client for 30 sec - exiting 19:03:16 (3452): No heartbeat from core client for 30 sec - exiting 19:03:17 (3452): No heartbeat from core client for 30 19:20:35 (3452): No heartbeat from core client for 30 sec - exiting [19:20:36] Number of jobs = 5 [19:20:36] Starting job 0,CPU time has been restored to 0.000000. 19:20:36 (3452): No heartbeat from core client for 30 sec - exiting No heartbeat: Exiting [19:20:53] Number of jobs = 5 [19:20:53] Starting job 0,CPU time has been restored to 0.000000. [03:59:33] Finished Job #0 [03:59:33] Starting job 1,CPU time has been restored to 29807.859875. [04:20:59] Finished Job #1 [04:20:59] Starting job 2,CPU time has been restored to 31083.324051. [04:39:37] Finished Job #2 [04:39:37] Starting job 3,CPU time has been restored to 32192.257159. Application exited with RC = 0x1 [08:33:26] Finished Job #3 [08:33:26] Starting job 4,CPU time has been restored to 46112.070388. [08:33:26] Skipping Job #4 08:33:28 (1556): called boinc_finish </stderr_txt> ]]> Didn't post the whole log because it was huge but the no heartbeat ran from 19:03:11-19:20:30 and then it ran normally, skipping job #4. Would love to hear an explanation for this one. ![]()
In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.
![]() ![]() |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I've had five so far, all on the same box, that ran and reported back -- four errored out, but one completed, and has been in Pending Validation (alongside a wingman) since Feb 22 (normal for beta to take so long to validate?).
Now have one each waiting to run or running on each of my three primary boxes, so we'll see how the other two "like" them. |
||
|
|
![]() |