Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 8
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 1005 times and has 7 replies Next Thread
wplachy
Senior Cruncher
Joined: Sep 4, 2007
Post Count: 423
Status: Offline
Reply to this Post  Reply with Quote 
12 Hour Cut Off Not Working?

I've seen and posted to this thread so I understand what is going on with the WUs that are showing less than 10% done and 1d+ to complete. What I don't understand is why the 12 hour CPU time cut off isn't working. I have this WU running now:
Name				ts02_c395_sr78a1_1
Application Discovering Dengue Drugs - Together - Phase 2 6.17
Workunit name ts02_c395_sr78a1
State Running
Received 2/2/2011 1:36:56 PM
Report deadline 2/12/2011 1:36:57 PM
Estimated app speed 2.67 GFLOPs/sec
Estimated task size 12092 GFLOPs
CPU time at last checkpoint 12:16:33
CPU time 12:25:59
Elapsed time 12:27:41
Estimated time remaining 01d,07:03:00
Fraction done 9.000 %
Virtual memory size 491.43 MB
Working set size 9.93 MB
Directory slots/2
Process ID 4496

I have also have the following from the current distribution that have run in excess of 12 CPU hours. Is the 12 hour cut off not working or am I missing something?
WUName			CPU
ts02_b483_sqb003_1 12.45
ts02_b483_sqb004_0 12.45
ts02_b483_sqb005_1 12.47
ts02_b483_sqb006_0 12.48
ts02_c283_sda000_0 12.43
ts02_c201_sr89b1_0 12.56
ts02_c237_sr34b0_0 12.57
ts02_c237_sr56b0_0 12.58
ts02_c237_sr78b0_0 14.12
ts02_c237_sr89b1_0 12.58


Bill P
----------------------------------------
Bill P

[Feb 4, 2011 3:05:37 AM]   Link   Report threatening or abusive post: please login first  Go to top 
kateiacy
Veteran Cruncher
USA
Joined: Jan 23, 2010
Post Count: 1027
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 12 Hour Cut Off Not Working?

I didn't think DDDT2 had any cutoff. CEP2 has a 12-hr cutoff.
----------------------------------------

[Feb 4, 2011 3:13:10 AM]   Link   Report threatening or abusive post: please login first  Go to top 
wplachy
Senior Cruncher
Joined: Sep 4, 2007
Post Count: 423
Status: Offline
Reply to this Post  Reply with Quote 
Re: 12 Hour Cut Off Not Working?

I didn't think DDDT2 had any cutoff. CEP2 has a 12-hr cutoff.

Yes it does. The WU I posted as in progress completed with the following:

ts02_ c395_ sr78a1_ 0-- 617 Error 2/2/11 19:36:59 2/3/11 22:26:43 14.87 257.8 / 0.0 <-Wingman reporting same
ts02_ c395_ sr78a1_ 1-- 617 Error 2/2/11 19:36:57 2/4/11 02:54:04 12.57 236.0 / 0.0 <-Mine

Result Log

Result Name: ts02_ c395_ sr78a1_ 1--

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
Maximum elapsed time exceeded</message>
<stderr_txt>
INFO: No state to restore. Start from the beginning.

Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Breakpoint Encountered (0x80000003) at address 0x76F722A1

Engaging BOINC Windows Runtime Debugger...

Bill P
Edit: highlite error
----------------------------------------
Bill P

----------------------------------------
[Edit 1 times, last edit by wplachy at Feb 4, 2011 3:22:25 AM]
[Feb 4, 2011 3:20:23 AM]   Link   Report threatening or abusive post: please login first  Go to top 
kateiacy
Veteran Cruncher
USA
Joined: Jan 23, 2010
Post Count: 1027
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 12 Hour Cut Off Not Working?

That certainly is interesting. One of my machines has an Atom processor, so it is absurdly slow (but reliable). It regularly takes way longer than 12 hrs on DDDT2 tasks and never gets stopped. Here are the results from its last few DDDT2 tasks; most took over 20 hrs and all are valid. These aren't inappropriately long-running tasks, just a slow processor.

ts02_ b087_ pca012_ 1-- kate-jetway64 Valid 1/24/11 22:58:34 1/26/11 12:50:12 21.97 114.3 / 115.5
ts02_ b041_ pcb010_ 1-- kate-jetway64 Valid 1/24/11 11:34:09 1/25/11 13:05:58 19.33 100.9 / 69.0
ts02_ a372_ pr91a1_ 1-- kate-jetway64 Valid 1/22/11 03:31:43 1/23/11 22:32:38 23.42 120.9 / 114.4
ts02_ a341_ sr67a1_ 0-- kate-jetway64 Valid 1/21/11 20:16:11 1/23/11 04:09:22 8.45 43.7 / 39.3
ts02_ a349_ pr02a1_ 0-- kate-jetway64 Valid 1/21/11 19:20:04 1/23/11 17:20:24 22.61 117.2 / 79.7
ts02_ a304_ sqa004_ 1-- kate-jetway64 Valid 1/21/11 10:17:36 1/22/11 23:26:11 8.42 43.8 / 24.8
ts02_ a290_ pcb009_ 1-- kate-jetway64 Valid 1/21/11 03:05:01 1/22/11 23:26:11 22.04 114.7 / 79.8
ts02_ a271_ sqa009_ 0-- kate-jetway64 Valid 1/21/11 01:39:33 1/22/11 19:02:16 8.97 48.6 / 20.8
ts02_ a273_ pr23a1_ 1-- kate-jetway64 Valid 1/21/11 01:02:04 1/22/11 19:01:47 25.12 136.2 / 68.5
ts02_ a254_ sr78a1_ 0-- kate-jetway64 Valid 1/20/11 20:10:43 1/21/11 23:46:16 8.68 45.7 / 42.6
ts02_ a260_ sqb007_ 0-- kate-jetway64 Valid 1/20/11 19:57:52 1/21/11 20:16:10 9.10 46.3 / 32.6
ts02_ a254_ pr78b0_ 0-- kate-jetway64 Valid 1/20/11 18:58:30 1/22/11 19:01:47 24.37 132.1 / 73.2
ts02_ a169_ pcb006_ 0-- kate-jetway64 Valid 1/19/11 22:04:39 1/21/11 19:20:04 25.56 126.9 / 112.1
ts02_ a168_ pcb002_ 1-- kate-jetway64 Valid 1/19/11 22:01:25 1/21/11 11:06:59 23.30 117.6 / 112.8
ts02_ a133_ pqb009_ 1-- kate-jetway64 Valid 1/19/11 13:01:40 1/21/11 13:08:49 27.04 141.7 / 72.0
----------------------------------------

[Feb 4, 2011 3:50:18 AM]   Link   Report threatening or abusive post: please login first  Go to top 
wplachy
Senior Cruncher
Joined: Sep 4, 2007
Post Count: 423
Status: Offline
Reply to this Post  Reply with Quote 
Re: 12 Hour Cut Off Not Working?

snip...That certainly is interesting.

Yes, it is! So I wonder what the "maximum time" is?? Based on processor perhaps?

Bill P
----------------------------------------
Bill P

[Feb 4, 2011 3:57:14 AM]   Link   Report threatening or abusive post: please login first  Go to top 
gb009761
Master Cruncher
Scotland
Joined: Apr 6, 2005
Post Count: 2977
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 12 Hour Cut Off Not Working?

I believe that ALL the projects have a time out limit - one which, if it goes outside the set properties of that particular project e.g., if it goes into an infinate loop, then at some point, it'll be killed/abort, and I'm guessing that that's what's happening with these very, very long DDDT2 units. Now, as to what that 'cut off limit' is, the techs will know (let's say, it's a certain number of CPU cycles - which, depending on whether the computer in question is a fast machine, will be reached far quicker than a slower machine). Hence, by the program stepping outside of these set parameters (of which, time will only be one of many), the WU aborts - this is the case with ALL properly written and tested computer programs.

With CEP2, it's different, as there's currently a hard & fast cut off after 12Hr of CPU cycles - just like there was with NRftW.
----------------------------------------

[Feb 4, 2011 4:05:18 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: 12 Hour Cut Off Not Working?

FAQs hold the answer since long and it was even mentioned in a post of yesterday... all tasks have a cutoff of 10x the FPOPS estimate in the task header **. So if an atom needs 5 hours to complete those est. FPOPS, it will run 50 hours tops. If a Q6600 needs 1 hour, it will run 10 hours tops before the "maximum elapsed time exceeded" ***

The CEP2 12 CPU hour cutoff is indiscriminate. No matter what CPU, 12 CPU hours is the hard cut off, and that is what the discussion is about.

--//--

** [always subject to change without notice]

*** Read ''elapsed'' here as "CPU" time, not wallclock time and not to be confused with the ''elapsed'' column in the BOINC Manager, which shows the wall-time a task was allowed to run... yes, not confusing at all ;P
----------------------------------------
[Edit 1 times, last edit by Former Member at Feb 4, 2011 8:24:02 AM]
[Feb 4, 2011 8:23:07 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Ingleside
Veteran Cruncher
Norway
Joined: Nov 19, 2005
Post Count: 974
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 12 Hour Cut Off Not Working?

I believe that ALL the projects have a time out limit - one which, if it goes outside the set properties of that particular project e.g., if it goes into an infinate loop, then at some point, it'll be killed/abort, and I'm guessing that that's what's happening with these very, very long DDDT2 units. Now, as to what that 'cut off limit' is, the techs will know (let's say, it's a certain number of CPU cycles - which, depending on whether the computer in question is a fast machine, will be reached far quicker than a slower machine). Hence, by the program stepping outside of these set parameters (of which, time will only be one of many), the WU aborts - this is the case with ALL properly written and tested computer programs.

Yes, all BOINC-tasks has among other things a time-limit. Each task has their individual limits on max disk-usage, max upload-file-size and max #flops. If any of these 3 limits is violated, the task errors-out and is terminated (*). If the upload-limit is exceeded, the affected file will not be uploaded at all. There's a separate memory-limit, but this can be exceeded, as long as the memory-preference-limits set by user isn't violated.

Max_flops gives the cpu-time-limit, for single-threaded applications the cpu-time-limit is (despite it being called elapsed_time):
max_elapsed_time = rsc_fpops_bound / host_info.p_fpops

there host_info.p_fpops is the floating-point-benchmark of the computer.

So, if one computer has a benchmark of 1000 Mflops and a limit of let's say 24 hours on a task, a computer with a 2000 Mflops-benchmark will have a 12-hour-limit on the same task.

fpops_bound is often set to be N * fpops_est, there fpops_est gives the estimated cpu-time. Using N between 5 and 10 is very common, some BOINC-projects uses even larger N to be on the safe side. Exactly how large N DDDT2 uses I don't know, and can't check at this point...

edit - I see SekeRob has commented:
So if an atom needs 5 hours to complete those est. FPOPS, it will run 50 hours tops.

AFAIK not correctly true, even N = 10, the estimated run-time includes DCF, so if the Atom has a DCF = 2, the cut-off-limit will AFAIK be 25 hours in this example...



(*) The disk-limit can be exceeded by CPDN without task erroring-out.
----------------------------------------


"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."
----------------------------------------
[Edit 2 times, last edit by Ingleside at Feb 4, 2011 8:58:58 AM]
[Feb 4, 2011 8:48:49 AM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread