Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Completed Research Forum: Discovering Dengue Drugs - Together - Phase 2 Forum Thread: 12 Hour Cut Off Not Working? |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 8
|
Author |
|
wplachy
Senior Cruncher Joined: Sep 4, 2007 Post Count: 423 Status: Offline |
I've seen and posted to this thread so I understand what is going on with the WUs that are showing less than 10% done and 1d+ to complete. What I don't understand is why the 12 hour CPU time cut off isn't working. I have this WU running now:
----------------------------------------Name ts02_c395_sr78a1_1 I have also have the following from the current distribution that have run in excess of 12 CPU hours. Is the 12 hour cut off not working or am I missing something? WUName CPU Bill P
Bill P
|
||
|
kateiacy
Veteran Cruncher USA Joined: Jan 23, 2010 Post Count: 1027 Status: Offline Project Badges: |
I didn't think DDDT2 had any cutoff. CEP2 has a 12-hr cutoff.
---------------------------------------- |
||
|
wplachy
Senior Cruncher Joined: Sep 4, 2007 Post Count: 423 Status: Offline |
I didn't think DDDT2 had any cutoff. CEP2 has a 12-hr cutoff. Yes it does. The WU I posted as in progress completed with the following: ts02_ c395_ sr78a1_ 0-- 617 Error 2/2/11 19:36:59 2/3/11 22:26:43 14.87 257.8 / 0.0 <-Wingman reporting same ts02_ c395_ sr78a1_ 1-- 617 Error 2/2/11 19:36:57 2/4/11 02:54:04 12.57 236.0 / 0.0 <-Mine Result Log Result Name: ts02_ c395_ sr78a1_ 1-- <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> Maximum elapsed time exceeded</message> <stderr_txt> INFO: No state to restore. Start from the beginning. Unhandled Exception Detected... - Unhandled Exception Record - Reason: Breakpoint Encountered (0x80000003) at address 0x76F722A1 Engaging BOINC Windows Runtime Debugger... Bill P Edit: highlite error
Bill P
----------------------------------------[Edit 1 times, last edit by wplachy at Feb 4, 2011 3:22:25 AM] |
||
|
kateiacy
Veteran Cruncher USA Joined: Jan 23, 2010 Post Count: 1027 Status: Offline Project Badges: |
That certainly is interesting. One of my machines has an Atom processor, so it is absurdly slow (but reliable). It regularly takes way longer than 12 hrs on DDDT2 tasks and never gets stopped. Here are the results from its last few DDDT2 tasks; most took over 20 hrs and all are valid. These aren't inappropriately long-running tasks, just a slow processor.
----------------------------------------ts02_ b087_ pca012_ 1-- kate-jetway64 Valid 1/24/11 22:58:34 1/26/11 12:50:12 21.97 114.3 / 115.5 ts02_ b041_ pcb010_ 1-- kate-jetway64 Valid 1/24/11 11:34:09 1/25/11 13:05:58 19.33 100.9 / 69.0 ts02_ a372_ pr91a1_ 1-- kate-jetway64 Valid 1/22/11 03:31:43 1/23/11 22:32:38 23.42 120.9 / 114.4 ts02_ a341_ sr67a1_ 0-- kate-jetway64 Valid 1/21/11 20:16:11 1/23/11 04:09:22 8.45 43.7 / 39.3 ts02_ a349_ pr02a1_ 0-- kate-jetway64 Valid 1/21/11 19:20:04 1/23/11 17:20:24 22.61 117.2 / 79.7 ts02_ a304_ sqa004_ 1-- kate-jetway64 Valid 1/21/11 10:17:36 1/22/11 23:26:11 8.42 43.8 / 24.8 ts02_ a290_ pcb009_ 1-- kate-jetway64 Valid 1/21/11 03:05:01 1/22/11 23:26:11 22.04 114.7 / 79.8 ts02_ a271_ sqa009_ 0-- kate-jetway64 Valid 1/21/11 01:39:33 1/22/11 19:02:16 8.97 48.6 / 20.8 ts02_ a273_ pr23a1_ 1-- kate-jetway64 Valid 1/21/11 01:02:04 1/22/11 19:01:47 25.12 136.2 / 68.5 ts02_ a254_ sr78a1_ 0-- kate-jetway64 Valid 1/20/11 20:10:43 1/21/11 23:46:16 8.68 45.7 / 42.6 ts02_ a260_ sqb007_ 0-- kate-jetway64 Valid 1/20/11 19:57:52 1/21/11 20:16:10 9.10 46.3 / 32.6 ts02_ a254_ pr78b0_ 0-- kate-jetway64 Valid 1/20/11 18:58:30 1/22/11 19:01:47 24.37 132.1 / 73.2 ts02_ a169_ pcb006_ 0-- kate-jetway64 Valid 1/19/11 22:04:39 1/21/11 19:20:04 25.56 126.9 / 112.1 ts02_ a168_ pcb002_ 1-- kate-jetway64 Valid 1/19/11 22:01:25 1/21/11 11:06:59 23.30 117.6 / 112.8 ts02_ a133_ pqb009_ 1-- kate-jetway64 Valid 1/19/11 13:01:40 1/21/11 13:08:49 27.04 141.7 / 72.0 |
||
|
wplachy
Senior Cruncher Joined: Sep 4, 2007 Post Count: 423 Status: Offline |
snip...That certainly is interesting. Yes, it is! So I wonder what the "maximum time" is?? Based on processor perhaps? Bill P
Bill P
|
||
|
gb009761
Master Cruncher Scotland Joined: Apr 6, 2005 Post Count: 2977 Status: Offline Project Badges: |
I believe that ALL the projects have a time out limit - one which, if it goes outside the set properties of that particular project e.g., if it goes into an infinate loop, then at some point, it'll be killed/abort, and I'm guessing that that's what's happening with these very, very long DDDT2 units. Now, as to what that 'cut off limit' is, the techs will know (let's say, it's a certain number of CPU cycles - which, depending on whether the computer in question is a fast machine, will be reached far quicker than a slower machine). Hence, by the program stepping outside of these set parameters (of which, time will only be one of many), the WU aborts - this is the case with ALL properly written and tested computer programs.
----------------------------------------With CEP2, it's different, as there's currently a hard & fast cut off after 12Hr of CPU cycles - just like there was with NRftW. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
FAQs hold the answer since long and it was even mentioned in a post of yesterday... all tasks have a cutoff of 10x the FPOPS estimate in the task header **. So if an atom needs 5 hours to complete those est. FPOPS, it will run 50 hours tops. If a Q6600 needs 1 hour, it will run 10 hours tops before the "maximum elapsed time exceeded" ***
----------------------------------------The CEP2 12 CPU hour cutoff is indiscriminate. No matter what CPU, 12 CPU hours is the hard cut off, and that is what the discussion is about. --//-- ** [always subject to change without notice] *** Read ''elapsed'' here as "CPU" time, not wallclock time and not to be confused with the ''elapsed'' column in the BOINC Manager, which shows the wall-time a task was allowed to run... yes, not confusing at all ;P [Edit 1 times, last edit by Former Member at Feb 4, 2011 8:24:02 AM] |
||
|
Ingleside
Veteran Cruncher Norway Joined: Nov 19, 2005 Post Count: 974 Status: Offline Project Badges: |
I believe that ALL the projects have a time out limit - one which, if it goes outside the set properties of that particular project e.g., if it goes into an infinate loop, then at some point, it'll be killed/abort, and I'm guessing that that's what's happening with these very, very long DDDT2 units. Now, as to what that 'cut off limit' is, the techs will know (let's say, it's a certain number of CPU cycles - which, depending on whether the computer in question is a fast machine, will be reached far quicker than a slower machine). Hence, by the program stepping outside of these set parameters (of which, time will only be one of many), the WU aborts - this is the case with ALL properly written and tested computer programs. Yes, all BOINC-tasks has among other things a time-limit. Each task has their individual limits on max disk-usage, max upload-file-size and max #flops. If any of these 3 limits is violated, the task errors-out and is terminated (*). If the upload-limit is exceeded, the affected file will not be uploaded at all. There's a separate memory-limit, but this can be exceeded, as long as the memory-preference-limits set by user isn't violated. Max_flops gives the cpu-time-limit, for single-threaded applications the cpu-time-limit is (despite it being called elapsed_time): max_elapsed_time = rsc_fpops_bound / host_info.p_fpops there host_info.p_fpops is the floating-point-benchmark of the computer. So, if one computer has a benchmark of 1000 Mflops and a limit of let's say 24 hours on a task, a computer with a 2000 Mflops-benchmark will have a 12-hour-limit on the same task. fpops_bound is often set to be N * fpops_est, there fpops_est gives the estimated cpu-time. Using N between 5 and 10 is very common, some BOINC-projects uses even larger N to be on the safe side. Exactly how large N DDDT2 uses I don't know, and can't check at this point... edit - I see SekeRob has commented: So if an atom needs 5 hours to complete those est. FPOPS, it will run 50 hours tops. AFAIK not correctly true, even N = 10, the estimated run-time includes DCF, so if the Atom has a DCF = 2, the cut-off-limit will AFAIK be 25 hours in this example... (*) The disk-limit can be exceeded by CPDN without task erroring-out. "I make so many mistakes. But then just think of all the mistakes I don't make, although I might." [Edit 2 times, last edit by Ingleside at Feb 4, 2011 8:58:58 AM] |
||
|
|