Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 99
Posts: 99   Pages: 10   [ Previous Page | 1 2 3 4 5 6 7 8 9 10 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 354307 times and has 98 replies Next Thread
foxfire
Advanced Cruncher
United States
Joined: Sep 1, 2007
Post Count: 121
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Beta Test - Outsmart Ebola Together - v7.14 - Jan 7, 2015 [ Issues Thread ]

Suspended at checkpoint, BOINC shutdown, BOINC started. All resumed from checkpoint.
BETA_OET1_0000297_xZAGP_0784_0
BETA_OET1_0000297_xZAGP_0785_0
BETA_OET1_0000297_xZAGP_0780_0
BETA_OET1_0000297_xZAGP_0721_1
BETA_OET1_0000297_xZAGP_0727_1
BETA_OET1_0000297_xZAGP_0742_1
BETA_OET1_0000297_xZAGP_0749_1
BETA_OET1_0000298_xEBGP-FA_rig_0920_1



Seeing some that have long intervals between checkpoints:

WU; Elap; (CPU); Since Checkpoint
---------------------------------
BETA_OET1_0000298_xEBGP-FA_rig_0503_0; 00:30:28; (00:30:27); [0] 00:30:27
BETA_OET1_0000298_xEBGP-FA_rig_0474_1; 00:33:58; (00:33:48); [0] 00:33:48
BETA_OET1_0000298_xEBGP-FA_rig_0276_0; 00:28:08; (00:27:57); [0] 00:27:57
----------------------------------------

[Jan 8, 2015 12:33:38 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Beta Test - Outsmart Ebola Together - v7.14 - Jan 7, 2015 [ Issues Thread ]

I'm now starting to believe that the rigid unit is checkpointing every 10% of the way through (but not based on time). When we know just how long these WUs might run we will have a better idea if that is often enough, but my feeling is that it probably isn't (though anything is better than nothing, of course).

I'm also seeing the progress jumping in 10% increments now, but after the 5th checkpoint it was showing 60% complete, not 50%, and it is now showing 70% without another checkpoint. That suggests it is going to sit for quite a while at 100% -- or maybe it will then drop back and increment apparently more normally -- but I don't think I'll still be awake when it gets there to see. While us old hands know that this is not a problem, it would be good if this could be improved on as previous posts by newbies have demonstrated that this confuses them, even to the point of killing WUs in the belief that they are "stuck".

Just my 2p'th.
[Jan 8, 2015 1:10:10 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Paul Schlaffer
Senior Cruncher
USA
Joined: Jun 12, 2005
Post Count: 278
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Beta Test - Outsmart Ebola Together - v7.14 - Jan 7, 2015 [ Issues Thread ]

Suspended and resumed at 70%. No % complete loss, so it must have been very close to a checkpoint. (The CEP WU fell back as expected).

Edit: After running for several minutes it fell back to 10%. Concur that the progress indicator is not incrementing smooth like the other work-units. Batch 298.
----------------------------------------
“Where an excess of power prevails, property of no sort is duly respected. No man is safe in his opinions, his person, his faculties, or his possessions.” – James Madison (1792)
----------------------------------------
[Edit 2 times, last edit by Paul Schlaffer at Jan 8, 2015 1:41:02 AM]
[Jan 8, 2015 1:16:50 AM]   Link   Report threatening or abusive post: please login first  Go to top 
DadX
Advanced Cruncher
Joined: Sep 9, 2006
Post Count: 56
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Beta Test - Outsmart Ebola Together - v7.14 - Jan 7, 2015 [ Issues Thread ]

Exited the app and stopped the processing between checkpoints. The WU re-started cleanly losing about the amount of time I expected.
----------------------------------------

[Jan 8, 2015 2:34:37 AM]   Link   Report threatening or abusive post: please login first  Go to top 
deltavee
Ace Cruncher
Texas Hill Country
Joined: Nov 17, 2004
Post Count: 4894
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Beta Test - Outsmart Ebola Together - v7.14 - Jan 7, 2015 [ Issues Thread ]

I haven't found any problems. The 298s are completing without incident in between 1.50 and 1.98 hours. Congratulations to the techs.
[Jan 8, 2015 2:44:06 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Rickjb
Veteran Cruncher
Australia
Joined: Sep 17, 2006
Post Count: 666
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Beta Test - Outsmart Ebola Together - v7.14 - Jan 7, 2015 [ Issues Thread ]

Slightly off-topic, but ...

I haven't had any beta WUs for many months. I may have disabled participating in beta tests before leaving home for a week away, with all or most machines crunching away unattended, and forgotten to re-enable it upon my return.

Now I can't find the beta test participation option. I can't see it in Settings >> My Profile or anywhere in Device Manager, and the other functions in Settings would not be relevant.
Before the website was unimproved, access was via a sidebar option, in our member's profile I think, and there was only 1 setting to cover all devices.

Update: I just found the settings, 1 for each device, under "My Contribution".
Why is there not a box for this in each Device Profile, so that it can be reached under "Settings"?

Please add to website "to do" list.
And good luck with the current beta test.
[Jan 8, 2015 4:13:08 AM]   Link   Report threatening or abusive post: please login first  Go to top 
NixChix
Veteran Cruncher
United States
Joined: Apr 29, 2007
Post Count: 1187
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Beta Test - Outsmart Ebola Together - v7.14 - Jan 7, 2015 [ Issues Thread ]

I received 4 295 WUs and all completed while I was at work and was not able to test checkpointing. All 4 ran simultaneously with no issues.

Cheers coffee
----------------------------------------

[Jan 8, 2015 4:24:49 AM]   Link   Report threatening or abusive post: please login first  Go to top 
OldChap
Veteran Cruncher
UK
Joined: Jun 5, 2009
Post Count: 978
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Beta Test - Outsmart Ebola Together - v7.14 - Jan 7, 2015 [ Issues Thread ]

LAIM off it seemed to fall back between 10 and 20% to a checkpoint..... When I next looked some moments later it was back to 10%
----------------------------------------

[Jan 8, 2015 7:16:51 AM]   Link   Report threatening or abusive post: please login first  Go to top 
ca05065
Senior Cruncher
Joined: Dec 4, 2007
Post Count: 328
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Beta Test - Outsmart Ebola Together - v7.14 - Jan 7, 2015 [ Issues Thread ]

I had one 00298 work unit.

After the sequence LAIM off, suspend, removed from memory message, resume, running I noticed the following:
Properties showed CPU last checkpoint as i hour 48 minutes but the stderr file shows zero CPU time at restart:

Result Log

Result Name: BETA_ OET1_ 0000298_ xEBGP-FA_ rig_ 1327_ 1--
<core_client_version>7.2.42</core_client_version>
<![CDATA[
<stderr_txt>
INFO: No state to restore. Start from the beginning.
[23:41:20] Number of tasks = 1
[23:41:20] Starting task 0,CPU time is 0.000000
[23:41:20] ./ZINC13130211_1.pdbqt size = 24 4 ../../projects/www.worldcommunitygrid.org/beta20.xEBGP-FA_rig.pdbqt size = 2451 0
[00:18:01] Number of tasks = 1
[00:18:01] Starting task 0,CPU time is 0.000000
[00:18:01] ./ZINC13130211_1.pdbqt size = 24 4 ../../projects/www.worldcommunitygrid.org/beta20.xEBGP-FA_rig.pdbqt size = 2451 0
[10:34:12] Number of tasks = 1
[10:34:12] Starting task 0,CPU time is 0.000000
[10:34:12] ./ZINC13130211_1.pdbqt size = 24 4 ../../projects/www.worldcommunitygrid.org/beta20.xEBGP-FA_rig.pdbqt size = 2451 0
[11:01:36] Finished task #0 cpu time used 8109.087677
11:01:36 (192716): called boinc_finish

Note that the CPU time changes from 0 to 8109 seconds in 27 minutes (1620 seconds).
[Jan 8, 2015 11:16:07 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Beta Test - Outsmart Ebola Together - v7.14 - Jan 7, 2015 [ Issues Thread ]

19 of batch 295/296 on Windows, most valid, BOINCTasks showing the 'normal' number of checkpoints taken, 5-7 per task, when the log indicates there were for instance 45 jobs included. This confirms the app follows the 'Write to Disk at Most' setting properly, which is set at 1000 seconds.

Result Name: BETA_ OET1_ 0000296_ xZAGP_ 0644_ 0--
<core_client_version>7.4.27</core_client_version>
<![CDATA[
<stderr_txt>
INFO: No state to restore. Start from the beginning.
[23:27:37] Number of tasks = 45
[23:27:37] Starting task 0,CPU time is 0.000000
[23:27:37] ./ZINC12785727_1.pdbqt size = 29 5 ../../projects/www.worldcommunitygrid.org/beta20.xZAGP.pdbqt size = 2321 0
[08:05:25] Finished task #0 cpu time used 482.167891
...
[10:31:30] ./ZINC12788076_1.pdbqt size = 30 6 ../../projects/www.worldcommunitygrid.org/beta20.xZAGP.pdbqt size = 2321 0
[10:34:56] Finished task #44 cpu time used 205.921320
10:34:56 (9948): called boinc_finish
[Jan 8, 2015 1:20:26 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 99   Pages: 10   [ Previous Page | 1 2 3 4 5 6 7 8 9 10 | Next Page ]
[ Jump to Last Post ]
Post new Thread