Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 41
Posts: 41   Pages: 5   [ Previous Page | 1 2 3 4 5 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 5684 times and has 40 replies Next Thread
Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1322
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test - March 10, 2016 [ Issues Thread ]

Progress is increasing until the task makes a new checkpoint at 7,8%. Last 2 lines in stderr.txt:
[08:43:21] INFO: Completed step 390000 of initial simulation
Writing checkpoint at step 390151.

and afterwards nothing at all.
Process is running using a full core, but no new checkpoint are made and progress stays the same.
The other BETA's were processed in about 3 hours and finished.
The restarted one is still running after 4.5 hours and it looks like it will never end.

Running > 6 hours, not yet endless, but never ending took me too long, so I decided to trash the
wcg_checkpoint_##.ckp's except the 00 and after restarting the task, checkpointing is working again and progress OK.
[Mar 11, 2016 2:27:01 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1322
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test - March 10, 2016 [ Issues Thread ]

The restarted one is still running after 4.5 hours and it looks like it will never end.

Do you see any activity in the designated "slots/" directory (files updated, timestamp updates), Crystal Pellet?

Some files were updated, but not all the expected ones, like the checkpoint files.
[Mar 11, 2016 2:29:54 PM]   Link   Report threatening or abusive post: please login first  Go to top 
i007008
Cruncher
Joined: Sep 16, 2005
Post Count: 21
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test - March 10, 2016 [ Issues Thread ]

Issue report:

Received 4 beta tasks estimated completion time approximately 19 hours per task.

Two beta tasks finished completely and correctly within 5 to 6 hours. Two remaining tasks running apparently correctly.

Rebooted my i3 laptop to resolve an apparently never ending UGM task – UGM task fixed itself and completed correctly after some time. Beta tasks apparently restarted correctly– one 36% complete, the other 88% complete.

After about 6 or 7 hours, noticed that the 2 beta tasks were clocking up “Progress” correctly, but the remaining estimated time remained constant – it never decreased, not even by one second.

Rebooted the laptop again, both beta tasks have started from 0%, but the time remaining is now decreasing correctly. On both reboots I exited BOINC before rebooting.

There is nothing relevant in the Event Log because of reboot. Task names:

11/03/2016 15:30:17 | World Community Grid | task BETA_AC0002_T000_F00043_S00001g_0 resumed by user
11/03/2016 15:30:17 | World Community Grid | task BETA_AC0002_T000_F00044_S00001j_0 resumed by user

Windows 8.1, i3 laptop, BOINC version 7.6.22 (x64)

Thanks guys.

Regards
Chris
[Mar 11, 2016 3:34:28 PM]   Link   Report threatening or abusive post: please login first  Go to top 
armstrdj
Former World Community Grid Tech
Joined: Oct 21, 2004
Post Count: 695
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test - March 10, 2016 [ Issues Thread ]

I am looking into the checkpoint/resume issue causing tasks to hang. It looks like everyone who is reporting that is experiencing the issue on Windows. If anyone has had this issue on a platform other than Windows please post. Also can someone who is having the issue confirm whether or not they ran any workunits in the previous version and had the issue?

Thanks,
armstrdj
[Mar 11, 2016 3:34:32 PM]   Link   Report threatening or abusive post: please login first  Go to top 
i007008
Cruncher
Joined: Sep 16, 2005
Post Count: 21
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test - March 10, 2016 [ Issues Thread ]

Hi armstrdj,

Just to confirm that these 4 beta units were the first I had received - I got none from the previous beta version. Sorry.
[Mar 11, 2016 3:44:09 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1322
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test - March 10, 2016 [ Issues Thread ]

Also can someone who is having the issue confirm whether or not they ran any workunits in the previous version and had the issue?

In the BETA task of Feb 25 I did not have this issue.
The checkpoints worked fine after a suspend/resume.
Only writing state-files was independent of the set checkpoint interval and the progress went to 100% and then backwards to about 77% and increasing again.

Addition: Retested again:
5 minutes after the resume (my WTD=60s) 5 files are changing:
state.cpt
state_prev.cpt
stderr.txt
wcg_checkpoint_00.ckp
wcg_hst1.state

Not the other ckp-files.
----------------------------------------
[Edit 1 times, last edit by Crystal Pellet at Mar 11, 2016 5:33:10 PM]
[Mar 11, 2016 4:00:42 PM]   Link   Report threatening or abusive post: please login first  Go to top 
ca05065
Senior Cruncher
Joined: Dec 4, 2007
Post Count: 325
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test - March 10, 2016 [ Issues Thread ]

I did not process any of the Feb 25th HST beta work units.
----------------------------------------
[Edit 1 times, last edit by ca05065 at Mar 12, 2016 8:24:06 AM]
[Mar 12, 2016 12:06:32 AM]   Link   Report threatening or abusive post: please login first  Go to top 
mito7
Advanced Cruncher
Slovakia
Joined: Oct 12, 2008
Post Count: 58
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test - March 10, 2016 [ Issues Thread ]

Mine got stuck too after stop/start (with LAIM off) but only after next checkpoint.

BETA_AC0002_T000_F00068_S00001i
----------------------------------------

[Mar 12, 2016 7:26:54 AM]   Link   Report threatening or abusive post: please login first  Go to top 
JSYKES
Senior Cruncher
Joined: Apr 28, 2007
Post Count: 200
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test - March 10, 2016 [ Issues Thread ]

Sorry Guys, I can't add much to this thread other than to say that I had 3 WU's on two PC's running WProx64 (two other PC's running the same OS didn't receive any - targeted Beta?) and all ran straight through in 1.5hrs +/- a couple of minutes - it was all so quick that they had arrived and departed again before I was aware of the issue of Beta WUs!!!
----------------------------------------

[Mar 12, 2016 8:28:41 AM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2167
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test - March 10, 2016 [ Issues Thread ]

I received two WUs on one Linux system; both are Pending Validation at the moment; one Result Log looks like this:

Result Name: BETA_ AC0002_ T000_ F00088_ S00001a_ 1--
<core_client_version>7.2.42</core_client_version>
<![CDATA[
<stderr_txt>
: Completed step 3965000 of initial simulation
[02:47:30] INFO: Completed step 3966000 of initial simulation
[02:47:31] INFO: Completed step 3967000 of initial simulation
[02:47:33] INFO: Completed step 3968000 of initial simulation
... 237 lines snipped ...
[02:54:23] INFO: Completed step 4206000 of initial simulation
[02:54:25] INFO: Completed step 4207000 of initial simulation
[02:54:26] INFO: Completed step 4208000 of initial simulation
Writing checkpoint at step 4208581.
[02:54:29] INFO: Completed step 4209000 of initial simulation
[02:54:30] INFO: Completed step 4210000 of initial simulation
[02:54:32] INFO: Completed step 4211000 of initial simulation
... 345 lines snipped ...
[03:04:24] INFO: Completed step 4557000 of initial simulation
[03:04:25] INFO: Completed step 4558000 of initial simulation
[03:04:27] INFO: Completed step 4559000 of initial simulation
Writing checkpoint at step 4559341.
[03:04:29] INFO: Completed step 4560000 of initial simulation
[03:04:30] INFO: Completed step 4561000 of initial simulation
[03:04:33] INFO: Completed step 4562000 of initial simulation
... 324 lines snipped ...
[03:14:22] INFO: Completed step 4887000 of initial simulation
[03:14:25] INFO: Completed step 4888000 of initial simulation
[03:14:26] INFO: Completed step 4889000 of initial simulation
Writing checkpoint at step 4889701.
[03:14:29] INFO: Completed step 4890000 of initial simulation
[03:14:30] INFO: Completed step 4891000 of initial simulation
[03:14:32] INFO: Completed step 4892000 of initial simulation
... 105 lines snipped ...
[03:17:58] INFO: Completed step 4998000 of initial simulation
[03:17:59] INFO: Completed step 4999000 of initial simulation
[03:18:01] INFO: Completed step 5000000 of initial simulation
[03:18:01] INFO: Finished initial simulation.
[03:18:02] INFO: Running secondary simulation
[03:18:04] INFO: Run complete, CPU time: 5834.155613
03:18:04 (17041): called boinc_finish(0)

</stderr_txt>
]]>


The other Result looks similar:

Result Name: BETA_ AC0002_ T000_ F00087_ S00001n_ 1--
<core_client_version>7.2.42</core_client_version>
<![CDATA[
<stderr_txt>
: Completed step 3965000 of initial simulation
[02:37:17] INFO: Completed step 3966000 of initial simulation
[02:37:19] INFO: Completed step 3967000 of initial simulation
[02:37:21] INFO: Completed step 3968000 of initial simulation
etc.
[03:06:49] INFO: Completed step 4998000 of initial simulation
[03:06:51] INFO: Completed step 4999000 of initial simulation
[03:06:52] INFO: Completed step 5000000 of initial simulation
[03:06:52] INFO: Finished initial simulation.
[03:06:52] INFO: Running secondary simulation
[03:06:54] INFO: Run complete, CPU time: 5826.833518
03:06:54 (16570): called boinc_finish(0)

</stderr_txt>
]]>


Note:
- the superfluous logging (every 1 or 2 (or 3) s)
- that the beginning of the log is missing
- the truncation of the log at the start (the word "INFO" is missing before the colon)
----------------------------------------
[Edit 3 times, last edit by adriverhoef at Mar 12, 2016 1:13:44 PM]
[Mar 12, 2016 11:46:55 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 41   Pages: 5   [ Previous Page | 1 2 3 4 5 | Next Page ]
[ Jump to Last Post ]
Post new Thread