| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 118
|
|
| Author |
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Now I'm really fed up with seeing more "exited with zero status but no 'finished' file" and wasting processing since the last checkpoint, so I've added <name>beta11</name> <max_concurrent>2</max_concurrent> to app_config; if the problem still continues, it'll be down to only 1 at a time.
|
||
|
|
pcwr
Ace Cruncher England Joined: Sep 17, 2005 Post Count: 10903 Status: Offline Project Badges:
|
Have noticed that if I suspend a wu, it starts from the beginning again.
----------------------------------------Patrick ![]() |
||
|
|
Speedy51
Veteran Cruncher New Zealand Joined: Nov 4, 2005 Post Count: 1326 Status: Offline Project Badges:
|
Now I'm really fed up with seeing more "exited with zero status but no 'finished' file" and wasting processing since the last checkpoint, so I've added <name>beta11</name> <max_concurrent>2</max_concurrent> to app_config; if the problem still continues, it'll be down to only 1 at a time. I understand your frustration Tony, however I consider you lucky because I haven't been able to get any beta tasks for some time now. I am not complaining I am just letting my thoughts out ![]() |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Have noticed that if I suspend a wu, it starts from the beginning again. Patrick, that happens only if the wu hasn't finished Job#0 (which admittedly is a long job). Checkpoints occur in a CEP2 wu at the end of each job. It also sounds like you need to turn on LAIM (Leave Applications In Memory while suspended) under Memory Usage in the Device Profile you're using - then an wu can continue from where it left off rather than from a previous checkpoint |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
... I consider you lucky because I haven't been able to get any beta tasks for some time now. It's more down to micromanagement than luck ![]() |
||
|
|
Speedy51
Veteran Cruncher New Zealand Joined: Nov 4, 2005 Post Count: 1326 Status: Offline Project Badges:
|
... I consider you lucky because I haven't been able to get any beta tasks for some time now. It's more down to micromanagement than luck ![]() I understand. I am sure I will pick some tasks up at some point ![]() |
||
|
|
KWSN - A Shrubbery
Master Cruncher Joined: Jan 8, 2006 Post Count: 1585 Status: Offline |
So, got a "Beta - The Clean Energy Project - Phase 2 needs 922.99MB more disk space. You currently have 1125.01 MB available and it needs 2048.00 MB." on my 24 core Xeon server. Odd, considering it was running maybe 5 work units total (19 cores idle).
----------------------------------------On an un-related note. My 16 core Xeon server has reached single result reliability on CEP Beta. Whodathunkit? ![]() Distributed computing volunteer since September 27, 2000 |
||
|
|
pcwr
Ace Cruncher England Joined: Sep 17, 2005 Post Count: 10903 Status: Offline Project Badges:
|
Have noticed that if I suspend a wu, it starts from the beginning again. Patrick, that happens only if the wu hasn't finished Job#0 (which admittedly is a long job). Checkpoints occur in a CEP2 wu at the end of each job. It also sounds like you need to turn on LAIM (Leave Applications In Memory while suspended) under Memory Usage in the Device Profile you're using - then an wu can continue from where it left off rather than from a previous checkpointIt already is. The computer also runs 24/7. Patrick ![]() |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I see that the validator has been turned on for these beta WUs. Many of my results are now Valid, but I noticed one in Pending Verification. It's BETA_ E225108_ 587_ S.328.C42H26N6O1.JXTUBXMYSMVBOD-UHFFFAOYSA-N.14_ s1_ 14_ 0-- which finished with RC = 0x1 in Job#0. The wingman's _1 finished in Job#6. _2 is In Progress. So it appears that convergence is run-dependent or machine-dependent in extreme cases. Is that to be expected?
|
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Patrick, so LAIM is on but a suspended wu restarts from the beginning - hmmm, can't explain that
![]() |
||
|
|
|