Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 149
Posts: 149   Pages: 15   [ Previous Page | 5 6 7 8 9 10 11 12 13 14 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 12272 times and has 148 replies Next Thread
BKraayev
Cruncher
Joined: Mar 23, 2005
Post Count: 46
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Longest WU ever, progressing, aborted:"Exceeded CPU time limit"

crying

mine failed after 60+ hours: "Aborting task faah5013_1hvl_1hxw_00_0: exceeded CPU time limit 232124.434010" - approx 95% complete. I see that version _1 also failed and there are now two more versions of this task out for processing - are those two people going to be disappointed too?




I see subsequent copies are coming back under the time limit, so I guess the work unit will eventiually be completed. My copy shows as coming back at 0.0 hours witrh no credit calimed - so I guess I'm out of luck. Oh, well, that's only twice in three years I've been disappointed - not too bad.
----------------------------------------

[Aug 4, 2008 5:15:27 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Second really long work unit received

The second cruncher's WU just returned. Here are the results from both:

faah5009_ 4hvp_ 1b6m_ 01_ 0--
Valid 07/29/2008 22:20:35 08/04/2008 02:38:30 62.39 710.2 / 724.6

faah5009_ 4hvp_ 1b6m_ 01_ 1--
Valid 07/29/2008 22:13:39 08/04/2008 17:21:37 66.01 739.1 / 724.6

<smile>
[Aug 4, 2008 5:36:44 PM]   Link   Report threatening or abusive post: please login first  Go to top 
bieberj
Senior Cruncher
United States
Joined: Dec 2, 2004
Post Count: 406
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Second really long work unit received

Hi Kevin,

It is good to know some of these things.

I dug out the two values you mentioned and fed them into my calculator. That turns out to be between 72-73 hours. Since I calculated my work to take 57 hours, I shouldn't have to worry about it timing out, right? And I don't need to change anything. I can now rest easily that it won't time out.

Question: Why such large numbers? Those who don't know how to use the copy/paste feature is bound to make a mistake in copying the data.

Josh
[Aug 4, 2008 6:57:03 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Second really long work unit received

bieberj, it doesn't matter exactly what number you change it to, so long as it is a lot bigger than it was. Personally, I just add an extra zero.

All you need to edit is client_state.xml, but it is really important to make sure BOINC is stopped when you do this, otherwise your change will be overwritten instantly.
[Aug 4, 2008 7:05:42 PM]   Link   Report threatening or abusive post: please login first  Go to top 
mclaver
Veteran Cruncher
Joined: Dec 19, 2005
Post Count: 566
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Second really long work unit received

Unfortuantely I can't run any tests. I was so unhappy about getting only a 50% ratio Claimed to Grant and loosing 366 hours due to cpu limit exceeded, I stopped accepting FAAH work units and aborted all waiting to process. I am still contibuting to all of the other ECG projects. All of my machines have 2 GB of memory. I have one quad processing a 50xx unit (40 hours so far, 5 more to go). It showsCommit memory at 291.724 and working set 109,592 with no hard faults. That does not seem to bad but it would be interesting to see how memory was being used on the XP-64 machine with four going at the same time.
----------------------------------------



[Aug 4, 2008 7:26:05 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Rickjb
Veteran Cruncher
Australia
Joined: Sep 17, 2006
Post Count: 666
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Second really long work unit received

knreed: Thank you for the post explaining how the CPU time limits are calculated (and stored and modified). If nothing else, those of us who have been following these forum threads have learnt a bit about the workings of the WCG/BOINC software systems.
Look in <host_info> for <p_fpops>. This is fpops value your computer got while running the BOINC benchmark. Next look for the field <rsc_fpops_bound> within the <workunit> tag for one of these long running workunits. The client will stop running the workunit after rsc_fpops_bound/p_fpops seconds goes by.

This means that computers that take longer to crunch WUs than their BOINC benchmarks suggest, are more likely to exceed their time limit. Such computers also over-claim points for their results. Cheats! angry . My Athlon64 is such a computer, and 1 out of 2 long WUs timed out on it. My Intel C2Q crunched about 15 without a hiccup. It under-claims slightly, if at all. 64-bit BOINC clients would also be more likely to time out.
If WCG get back our CPU types, they might check whether some CPUs are over-represented among the WUs that timed out.

WU / CPU / Hours / Claimed / Granted / Granted per hour
The quick brown Intel quad ...
faah5011_ 1hvi_ 1hef_ 00_ 0 / q9450 / 24.49 / 602.5 / 600.3 / 24.5
dddt0602i0496_ 100150_ 0 / q9450 / 3.52 / 87.2 / 100.6 / 28.6
... Jumps way over the lazy AMD dog:
faah5010_ 1hvh_ 1gnn_ 00_ 0 / a64x2 / 39.87 / 663.1 / 484.3 / 12.1
dddt0602i0491_ 100551_ 0 / a64x2 / 5.24 / 85.5 / 75.9 / 14.5

Simple averages of granted/hour: A64X2 13.3; Q9650 26.5
BOINC benchmarks:
Q9450: 3854/8028 FP/Int MIPS/CPU
A64X2: 2826/5095 FP/Int MIPS/CPU
Benchmark Ratios, A64/Q9450: FP 0.733; Int 0.635; unbiased average 0.684; Points granted 0.502
A64 claims to be 68% as fast as Q9450 but is only 50% as fast on points granted, AND where points granted are the average of the 2 quorum claims, the granting is biased towards the over-claimer.

Brown Intel poo
------------------ (all over) tongue
. . . . AMD . . . .
----------------------------------------
[Edit 5 times, last edit by Rickjb at Aug 5, 2008 1:15:58 PM]
[Aug 5, 2008 10:01:49 AM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Second really long work unit received

We have just awarded credit to computers whose results timed out due to taking too long to run and reported the cpu time that it ran. (more will be arriving and we will re run this in a couple of days to pick those up).

We awarded credit to:

Users: 866
Hosts: 1097
Results: 1244
BOINC Credit: 679,030.17
World Community Grid Points: 4,753,211.19
Run Time: 8.7 years

You can check to see if you were awarded credit for your long running result by going to the results status page and checking if the granted credit column for the result has a non-zero value.

[edit - revised to note the fact that if 0 cpu time was reported then credit was not awarded]
----------------------------------------
[Edit 1 times, last edit by knreed at Aug 5, 2008 9:25:51 PM]
[Aug 5, 2008 4:09:22 PM]   Link   Report threatening or abusive post: please login first  Go to top 
BKraayev
Cruncher
Joined: Mar 23, 2005
Post Count: 46
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Second really long work unit received

We have just awarded credit all of the computers whose results timed out due to taking too long to run. (more will be arriving and we will re run this in a couple of days to pick those up).

We awarded credit to:

Users: 866
Hosts: 1097
Results: 1244
BOINC Credit: 679,030.17
World Community Grid Points: 4,753,211.19
Run Time: 8.7 years

You can check to see if you were awarded credit for your long running result by going to the results status page and checking if the granted credit column for the result has a non-zero value.

----------------------------------------

[Aug 5, 2008 8:54:41 PM]   Link   Report threatening or abusive post: please login first  Go to top 
BKraayev
Cruncher
Joined: Mar 23, 2005
Post Count: 46
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
smile Re: Longest WU ever, progressing, aborted:"Exceeded CPU time limit"

crying

mine failed after 60+ hours: "Aborting task faah5013_1hvl_1hxw_00_0: exceeded CPU time limit 232124.434010" - approx 95% complete. I see that version _1 also failed and there are now two more versions of this task out for processing - are those two people going to be disappointed too?




I see subsequent copies are coming back under the time limit, so I guess the work unit will eventiually be completed. My copy shows as coming back at 0.0 hours witrh no credit calimed - so I guess I'm out of luck. Oh, well, that's only twice in three years I've been disappointed - not too bad.




Note about awarding credit to ALL those that failed due to exceeding CPU time isn't quite true - apparently you have no way of calculating credits for those that exceeded the limit and reported back 0 time ... I've seen a few other posts about the 0 time being reported, so I'm not alone .... but losing 60+ hours once every three years isn't really a big deal, so I'm willing to eat the loss.
----------------------------------------

[Aug 5, 2008 8:58:21 PM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Longest WU ever, progressing, aborted:"Exceeded CPU time limit"

Good point. I've revised my note to include that fact.
[Aug 5, 2008 9:26:08 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 149   Pages: 15   [ Previous Page | 5 6 7 8 9 10 11 12 13 14 | Next Page ]
[ Jump to Last Post ]
Post new Thread