Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 41
|
![]() |
Author |
|
Crystal Pellet
Veteran Cruncher Joined: May 21, 2008 Post Count: 1322 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Progress is increasing until the task makes a new checkpoint at 7,8%. Last 2 lines in stderr.txt: The other BETA's were processed in about 3 hours and finished.[08:43:21] INFO: Completed step 390000 of initial simulation Writing checkpoint at step 390151. and afterwards nothing at all. Process is running using a full core, but no new checkpoint are made and progress stays the same. The restarted one is still running after 4.5 hours and it looks like it will never end. Running > 6 hours, not yet endless, but never ending took me too long, so I decided to trash the wcg_checkpoint_##.ckp's except the 00 and after restarting the task, checkpointing is working again and progress OK. |
||
|
Crystal Pellet
Veteran Cruncher Joined: May 21, 2008 Post Count: 1322 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The restarted one is still running after 4.5 hours and it looks like it will never end. Do you see any activity in the designated "slots/" directory (files updated, timestamp updates), Crystal Pellet? Some files were updated, but not all the expected ones, like the checkpoint files. |
||
|
i007008
Cruncher Joined: Sep 16, 2005 Post Count: 21 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Issue report:
Received 4 beta tasks estimated completion time approximately 19 hours per task. Two beta tasks finished completely and correctly within 5 to 6 hours. Two remaining tasks running apparently correctly. Rebooted my i3 laptop to resolve an apparently never ending UGM task – UGM task fixed itself and completed correctly after some time. Beta tasks apparently restarted correctly– one 36% complete, the other 88% complete. After about 6 or 7 hours, noticed that the 2 beta tasks were clocking up “Progress” correctly, but the remaining estimated time remained constant – it never decreased, not even by one second. Rebooted the laptop again, both beta tasks have started from 0%, but the time remaining is now decreasing correctly. On both reboots I exited BOINC before rebooting. There is nothing relevant in the Event Log because of reboot. Task names: 11/03/2016 15:30:17 | World Community Grid | task BETA_AC0002_T000_F00043_S00001g_0 resumed by user 11/03/2016 15:30:17 | World Community Grid | task BETA_AC0002_T000_F00044_S00001j_0 resumed by user Windows 8.1, i3 laptop, BOINC version 7.6.22 (x64) Thanks guys. Regards Chris |
||
|
armstrdj
Former World Community Grid Tech Joined: Oct 21, 2004 Post Count: 695 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I am looking into the checkpoint/resume issue causing tasks to hang. It looks like everyone who is reporting that is experiencing the issue on Windows. If anyone has had this issue on a platform other than Windows please post. Also can someone who is having the issue confirm whether or not they ran any workunits in the previous version and had the issue?
Thanks, armstrdj |
||
|
i007008
Cruncher Joined: Sep 16, 2005 Post Count: 21 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Hi armstrdj,
Just to confirm that these 4 beta units were the first I had received - I got none from the previous beta version. Sorry. |
||
|
Crystal Pellet
Veteran Cruncher Joined: May 21, 2008 Post Count: 1322 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Also can someone who is having the issue confirm whether or not they ran any workunits in the previous version and had the issue? In the BETA task of Feb 25 I did not have this issue. The checkpoints worked fine after a suspend/resume. Only writing state-files was independent of the set checkpoint interval and the progress went to 100% and then backwards to about 77% and increasing again. Addition: Retested again: 5 minutes after the resume (my WTD=60s) 5 files are changing: state.cpt state_prev.cpt stderr.txt wcg_checkpoint_00.ckp wcg_hst1.state Not the other ckp-files. [Edit 1 times, last edit by Crystal Pellet at Mar 11, 2016 5:33:10 PM] |
||
|
ca05065
Senior Cruncher Joined: Dec 4, 2007 Post Count: 325 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I did not process any of the Feb 25th HST beta work units.
----------------------------------------[Edit 1 times, last edit by ca05065 at Mar 12, 2016 8:24:06 AM] |
||
|
mito7
Advanced Cruncher Slovakia Joined: Oct 12, 2008 Post Count: 58 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Mine got stuck too after stop/start (with LAIM off) but only after next checkpoint.
----------------------------------------BETA_AC0002_T000_F00068_S00001i ![]() |
||
|
JSYKES
Senior Cruncher Joined: Apr 28, 2007 Post Count: 200 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Sorry Guys, I can't add much to this thread other than to say that I had 3 WU's on two PC's running WProx64 (two other PC's running the same OS didn't receive any - targeted Beta?) and all ran straight through in 1.5hrs +/- a couple of minutes - it was all so quick that they had arrived and departed again before I was aware of the issue of Beta WUs!!!
----------------------------------------![]() |
||
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2167 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I received two WUs on one Linux system; both are Pending Validation at the moment; one Result Log looks like this:
----------------------------------------Result Name: BETA_ AC0002_ T000_ F00088_ S00001a_ 1-- <core_client_version>7.2.42</core_client_version> <![CDATA[ <stderr_txt> : Completed step 3965000 of initial simulation [02:47:30] INFO: Completed step 3966000 of initial simulation [02:47:31] INFO: Completed step 3967000 of initial simulation [02:47:33] INFO: Completed step 3968000 of initial simulation ... 237 lines snipped ... [02:54:23] INFO: Completed step 4206000 of initial simulation [02:54:25] INFO: Completed step 4207000 of initial simulation [02:54:26] INFO: Completed step 4208000 of initial simulation Writing checkpoint at step 4208581. [02:54:29] INFO: Completed step 4209000 of initial simulation [02:54:30] INFO: Completed step 4210000 of initial simulation [02:54:32] INFO: Completed step 4211000 of initial simulation ... 345 lines snipped ... [03:04:24] INFO: Completed step 4557000 of initial simulation [03:04:25] INFO: Completed step 4558000 of initial simulation [03:04:27] INFO: Completed step 4559000 of initial simulation Writing checkpoint at step 4559341. [03:04:29] INFO: Completed step 4560000 of initial simulation [03:04:30] INFO: Completed step 4561000 of initial simulation [03:04:33] INFO: Completed step 4562000 of initial simulation ... 324 lines snipped ... [03:14:22] INFO: Completed step 4887000 of initial simulation [03:14:25] INFO: Completed step 4888000 of initial simulation [03:14:26] INFO: Completed step 4889000 of initial simulation Writing checkpoint at step 4889701. [03:14:29] INFO: Completed step 4890000 of initial simulation [03:14:30] INFO: Completed step 4891000 of initial simulation [03:14:32] INFO: Completed step 4892000 of initial simulation ... 105 lines snipped ... [03:17:58] INFO: Completed step 4998000 of initial simulation [03:17:59] INFO: Completed step 4999000 of initial simulation [03:18:01] INFO: Completed step 5000000 of initial simulation [03:18:01] INFO: Finished initial simulation. [03:18:02] INFO: Running secondary simulation [03:18:04] INFO: Run complete, CPU time: 5834.155613 03:18:04 (17041): called boinc_finish(0) </stderr_txt> ]]> The other Result looks similar: Result Name: BETA_ AC0002_ T000_ F00087_ S00001n_ 1-- <core_client_version>7.2.42</core_client_version> <![CDATA[ <stderr_txt> : Completed step 3965000 of initial simulation [02:37:17] INFO: Completed step 3966000 of initial simulation [02:37:19] INFO: Completed step 3967000 of initial simulation [02:37:21] INFO: Completed step 3968000 of initial simulation etc. [03:06:49] INFO: Completed step 4998000 of initial simulation [03:06:51] INFO: Completed step 4999000 of initial simulation [03:06:52] INFO: Completed step 5000000 of initial simulation [03:06:52] INFO: Finished initial simulation. [03:06:52] INFO: Running secondary simulation [03:06:54] INFO: Run complete, CPU time: 5826.833518 03:06:54 (16570): called boinc_finish(0) </stderr_txt> ]]> Note: - the superfluous logging (every 1 or 2 (or 3) s) - that the beginning of the log is missing - the truncation of the log at the start (the word "INFO" is missing before the colon) [Edit 3 times, last edit by adriverhoef at Mar 12, 2016 1:13:44 PM] |
||
|
|
![]() |