| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 89
|
|
| Author |
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
There was no app change or was there? There's no rule that lets 'reliable' expire for a device through time, afaik, so if all started with init 2 distribution it may have been a technician enforcing.
Whilst 20 was mentioned several times and observed hands on, saw on android that the number may have been set to 5 for fahv before zero redundant entitlement kicks in. This is the seeming original berkeley number before switching to init 1. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi.
----------------------------------------Got this error, but it's not showing an error. lost over ten hours.Error 9/2/14 22:36:59 9/4/14 21:32:55 10.40 / 10.73 137.7 / 0.0 Result Log Result Name: E225125_ 416_ S.348.C45H29N7.KYMGRXNNDNEUHP-UHFFFAOYSA-N.13_ s1_ 14_ 0-- <core_client_version>7.2.42</core_client_version> <![CDATA[ <stderr_txt> INFO: No state to restore. Start from the beginning. [09:00:41] Number of jobs = 8 [15:57:33] Starting job 6,CPU time has been restored to 22713.954160. [15:57:33] Starting new Job [15:57:33] Qink name = fldman [15:57:43] Qink name = gesman [15:57:45] Qink name = scfman Application exited with RC = 0x100 [20:04:07] Finished Job #6 [20:04:07] Starting job 7,CPU time has been restored to 37194.587739. [20:04:07] Skipping Job #7 20:04:13 (5742): called boinc_finish </stderr_txt> [Edit 1 times, last edit by Former Member at Sep 4, 2014 9:44:02 PM] |
||
|
|
KLiK
Master Cruncher Croatia Joined: Nov 13, 2006 Post Count: 3108 Status: Offline Project Badges:
|
CEP2 WU's started pouring, finally! :D
---------------------------------------- |
||
|
|
jonnieb-uk
Ace Cruncher England Joined: Nov 30, 2011 Post Count: 6105 Status: Offline Project Badges:
|
CEP2 WU's started pouring, finally! :D Much better!
---------------------------------------- [Edit 1 times, last edit by jonnieb-uk at Sep 5, 2014 8:36:24 AM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
That's about the midday statistics times two for the fourth.
Still seeing reports of these 'Application exited with RC = 0x100', not going to touch cep2 until this get's proper treatment. Not a fail of the user's device, a concern in fact that different devices can bow out at different phases without hitting the 18 hour wall, at least don't see 37194.587739 seconds as being 18 hours. We all make our choices. |
||
|
|
Jim1348
Veteran Cruncher USA Joined: Jul 13, 2009 Post Count: 1066 Status: Offline Project Badges:
|
I haven't hit the 18 hour wall on any of my four machines (3 Ivy Bridge, 1 Haswell); the maximum has been 10.5 hours on the new work units. And all 5 errors thus far have been the above-noted "application exits with RC = 0x1 after Job #6". So that should be an easy test for them to distinguish the computations blowing up from real machine problems, at least in most cases. (It might miss a few cases, but it would still improve the problem of losing zero quorum a lot.)
----------------------------------------At least I hope so; otherwise, they will lose computing power from unnecessary redundancy, as well as people leaving. I will stick with it, since my machines are all set up for it now with ramdisks that aren't really needed for other work. But I would prefer that they think about a solution, if they have not already. [Edit 1 times, last edit by Jim1348 at Sep 5, 2014 2:38:53 PM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Nobody has responded to this thread out of a parallel universe where information is not seeping across from one thread to the next http://www.worldcommunitygrid.org/forums/wcg/viewthread_thread,37164 Posted today and not passing muster, not the owners device fault to hit the 18 hour guillotine.
|
||
|
|
AgrFan
Senior Cruncher USA Joined: Apr 17, 2008 Post Count: 397 Status: Offline Project Badges:
|
Reposting ... thanks lavaflow for the bump!
----------------------------------------This unit was marked as error after hitting the 18 hr cutoff. Seems like a waste of computing time. E225138_ 290_ S.364.C49H35N5.BGMOXQDZFQOESW-UHFFFAOYSA-N.9_ s1_ 14_ 3-- - In Progress 9/5/14 13:15:24 9/9/14 01:15:24 0.00 0.0 / 0.0 E225138_ 290_ S.364.C49H35N5.BGMOXQDZFQOESW-UHFFFAOYSA-N.9_ s1_ 14_ 2-- 640 Error 9/4/14 14:49:30 9/5/14 13:14:06 16.37 137.7 / 0.0 E225138_ 290_ S.364.C49H35N5.BGMOXQDZFQOESW-UHFFFAOYSA-N.9_ s1_ 14_ 1-- 640 Pending Verification 9/3/14 19:57:05 9/4/14 06:43:40 1.51 118.1 / 0.0 E225138_ 290_ S.364.C49H35N5.BGMOXQDZFQOESW-UHFFFAOYSA-N.9_ s1_ 14_ 0-- 640 Error 9/3/14 19:54:10 9/4/14 14:43:46 18.00 437.6 / 0.0 <== me <core_client_version>7.0.27</core_client_version> <![CDATA[ <stderr_txt> INFO: No state to restore. Start from the beginning. [15:54:45] Number of jobs = 8 .. [08:53:11] Starting job 6,CPU time has been restored to 58270.945703. [08:53:12] Starting new Job [08:53:12] Qink name = fldman [08:53:27] Qink name = gesman [08:53:29] Qink name = scfman Killing job because cpu time limit has been exceeded. 58270.945703||6529.600073||0.000000 [10:42:38] Finished Job #6 10:42:46 (28416): called boinc_finish Intel i5-650, 8GB, Ubuntu Server 12.04 LTS
[Edit 3 times, last edit by AgrFan at Sep 5, 2014 11:57:35 PM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
It really shouldn't matter if the CPU clock runs at 800MHz as long as the unit is completed before the 10 day dead line or if the CPU time is 2 full days as long as its error free. The problem is if the CPU is slow enough that it doesn't finish the first job (which is quite long now) before the 18 hour time limit then nothing gets uploaded to the CEP scientists (as far as I understand it) which essentially wasted your 18 hours. Also, since it doesn't checkpoint until the first job is done, if your computer restarts the workunit would reset to 0%. So while its true that slower CPU's still contribute to the grid, it gets to a point where CEP becomes impractical and that computer should be switched to different projects However they all come with a 10 day dead line, the 18 hour window I trust is to make sure it doesn't loop. Why would anyone want it done in 10 days but require a 18 hour run window?? It's like giving a person a week to paint a picture, your not paying anything for it, but telling them they only get to spend 3 hours on it. The molecules they are testing are getting so large no home computer will be able to handle a 18 hour run window. Besides I don't think they really care about the last test because it's the ones before it that tell if it has a chance of becoming a valid candidate. At this stage we are just pretesting for them to see if a molecule has a chance. All we are doing is sorting the "maybes" out of the pile, only presenting viable candidates. In time as the electron cloud becomes ever larger no computer will finish in a 18 hour window. You'll see. I'm starting to see I have a much better feel for what we are doing then you do. I'm not surprised by that I spent half my life in labs. [Edit 1 times, last edit by Former Member at Sep 6, 2014 11:02:32 AM] |
||
|
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7849 Status: Offline Project Badges:
|
However they all come with a 10 day dead line, the 18 hour window I trust is to make sure it doesn't loop. Why would anyone want it done in 10 days but require a 18 hour run window?? It's like giving a person a week to paint a picture, your not paying anything for it, but telling them they only get to spend 3 hours on it. You are correct in the 18 hour window being in place as a loop limiter. However, the 10 day deadline is in place for those who wish to maintain a cache of several days. It also helps those who do not run 24/7 (although with the long checkpoint times of this project this means it may not be the best use of their computer time.) The two separate time windows serve different functions. Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
|
|