| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 12
|
|
| Author |
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
One of my servers running FreeBSD-CURRENT with linux_base-fc4 wmulation some days ago starts dumping all workunit with the followind error:
Aborting task dddt0101a0210_ZINC05148158-0000_03_0: exceeded CPU time limit 773.619105 There are at least three other machines with similar setup, and there is only one failing. I tried to narrow the difference, but failed. Any hints? |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I recall that a few DDDT work units overran their space limit recently. However, if that limit is in seconds, then something is wrong.
You have posted in the beta forum. Was this a beta workunit? |
||
|
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7849 Status: Offline Project Badges:
|
Is your machine date correct? You might also check this thread from another project which may have an explanation.
----------------------------------------http://www.nanohive-1.org/atHome/forum_thread.php?id=197 Cheers
Sgt. Joe
----------------------------------------*Minnesota Crunchers* [Edit 1 times, last edit by Sgt.Joe at Oct 21, 2007 4:34:48 PM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
No, this is regular work unit; however, as FreeBSD is not supported platform, I decided to post here. Should I repost to regular support forum?
|
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
hi dmarck,
----------------------------------------Got a couple of the "dddt0101a0210" series, so put them forward and gone way past the 773 seconds. Can you copy the list of failed WU's from your Result Status pages so we know if it's batch limite and can you confirm they all are timing out at around 13 minutes into the job. cheers Added: dont worry about reposting. We'll have the thread moved to DDDT or BOINC support as and when we know its limited to this project.
WCG
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 1 times, last edit by Sekerob at Oct 21, 2007 4:45:33 PM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Is your machine date correct? You might also check this thread from another project which may have an explanation. http://www.nanohive-1.org/atHome/forum_thread.php?id=197 One of the first changes to my regular config is setting up ntpd, so time is correct. As to the thread you've referred to, I run benchmarks explicitly. It reports: 20:40:41 Resuming computation waiting for another 10 minutes - and I see workunit passed 15 mins of CPU time (which is larger than 770 seconds reported in logs) So, I assume CPU benchmarks goes mad somehow. Thanks for the tip! |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Got a couple of the "dddt0101a0210" series, so put them forward and gone way past the 773 seconds. Can you copy the list of failed WU's from your Result Status pages so we know if it's batch limite and can you confirm they all are timing out at around 13 minutes into the job. After rerunning benchmarks (see above) workunit fails too. And this is not dddt-specific. Here is grep result from boinc-client logs: Oct 18 19:46:11 <daemon.info> woozlie boinc[727]: [World Community Grid] Aborting task dddt0101a0196_ZINC05466961-0000_03_0: exceeded CPU time limit 768.356602 Oct 18 19:46:15 <daemon.info> woozlie boinc[727]: [World Community Grid] Aborting task dddt0101a0197_ZINC05467486-0000_01_0: exceeded CPU time limit 768.356602 Oct 18 20:00:03 <daemon.info> woozlie boinc[727]: [World Community Grid] Aborting task faah2515_ZINC02048049_xmd03210_02_0: exceeded CPU time limit 743.457526 Oct 18 20:00:36 <daemon.info> woozlie boinc[727]: [World Community Grid] Aborting task dddt0101a0198_ZINC00902783-0001_03_1: exceeded CPU time limit 768.356602 Oct 18 20:18:49 <daemon.info> woozlie boinc[1058]: [World Community Grid] Aborting task dddt0101a0198_ZINC01076150-0000_00_0: exceeded CPU time limit 768.356602 Oct 18 20:23:40 <daemon.info> woozlie boinc[1058]: [World Community Grid] Aborting task dddt0101a0198_ZINC01079204-0000_02_0: exceeded CPU time limit 768.356602 Oct 18 20:34:25 <daemon.info> woozlie boinc[1058]: [World Community Grid] Aborting task dddt0101a0198_ZINC02381773-0000_06_1: exceeded CPU time limit 768.356602 Oct 18 20:39:01 <daemon.info> woozlie boinc[1058]: [World Community Grid] Aborting task faah2515_ZINC03954379_xmd03210_00_0: exceeded CPU time limit 743.457526 Oct 18 20:50:05 <daemon.info> woozlie boinc[1058]: [World Community Grid] Aborting task dddt0101a0198_ZINC02392119-0000_02_1: exceeded CPU time limit 768.356602 Oct 18 20:54:48 <daemon.info> woozlie boinc[1058]: [World Community Grid] Aborting task dddt0101a0198_ZINC03452270-0000_00_1: exceeded CPU time limit 768.356602 Oct 18 21:13:24 <daemon.info> woozlie boinc[23720]: [World Community Grid] Aborting task dddt0101a0198_ZINC02872295-0000_02_1: exceeded CPU time limit 768.356602 Oct 18 21:16:27 <daemon.info> woozlie boinc[23720]: [World Community Grid] Aborting task dddt0101a0198_ZINC05026996-0000_03_1: exceeded CPU time limit 768.356602 Oct 18 21:28:59 <daemon.info> woozlie boinc[23720]: [World Community Grid] Aborting task dddt0101a0198_ZINC05027024-0000_04_0: exceeded CPU time limit 768.356602 Oct 18 21:32:05 <daemon.info> woozlie boinc[23720]: [World Community Grid] Aborting task dddt0101a0198_ZINC05027039-0000_05_1: exceeded CPU time limit 768.356602 Oct 18 21:44:48 <daemon.info> woozlie boinc[23720]: [World Community Grid] Aborting task dddt0101a0198_ZINC05027540-0000_00_0: exceeded CPU time limit 768.356602 Oct 18 21:49:41 <daemon.info> woozlie boinc[23720]: [World Community Grid] Aborting task faah2515_ZINC05665000_xmd03210_01_1: exceeded CPU time limit 743.457526 Oct 18 22:00:23 <daemon.info> woozlie boinc[23720]: [World Community Grid] Aborting task dddt0101a0198_ZINC05027649-0000_03_0: exceeded CPU time limit 768.356602 Oct 18 22:05:19 <daemon.info> woozlie boinc[23720]: [World Community Grid] Aborting task dddt0101a0198_ZINC05027663-0001_00_0: exceeded CPU time limit 768.356602 Oct 18 22:15:29 <daemon.info> woozlie boinc[23720]: [World Community Grid] Aborting task faah2516_ZINC00624859_xmd03220_01_0: exceeded CPU time limit 743.457526 Oct 18 22:20:55 <daemon.info> woozlie boinc[23720]: [World Community Grid] Aborting task dddt0101a0198_ZINC05027690-0000_01_1: exceeded CPU time limit 768.356602 Oct 18 22:31:05 <daemon.info> woozlie boinc[23720]: [World Community Grid] Aborting task dddt0101a0198_ZINC05028211-0000_02_0: exceeded CPU time limit 768.356602 Oct 18 22:36:34 <daemon.info> woozlie boinc[23720]: [World Community Grid] Aborting task dddt0101a0198_ZINC05028216-0001_02_0: exceeded CPU time limit 768.356602 Oct 18 22:59:05 <daemon.info> woozlie boinc[24115]: [World Community Grid] Aborting task dddt0101a0198_ZINC05028231-0000_02_1: exceeded CPU time limit 768.356602 Oct 18 23:00:28 <daemon.info> woozlie boinc[24115]: [World Community Grid] Aborting task dddt0101a0198_ZINC05028207-0000_00_1: exceeded CPU time limit 768.356602 Oct 18 23:14:38 <daemon.info> woozlie boinc[24115]: [World Community Grid] Aborting task dddt0101a0198_ZINC05028310-0000_03_1: exceeded CPU time limit 768.356602 Oct 18 23:16:48 <daemon.info> woozlie boinc[24115]: [World Community Grid] Aborting task dddt0101a0198_ZINC05028347-0000_01_0: exceeded CPU time limit 768.356602 Oct 18 23:30:15 <daemon.info> woozlie boinc[24115]: [World Community Grid] Aborting task dddt0101a0198_ZINC05468346-0000_00_1: exceeded CPU time limit 768.356602 Oct 18 23:31:59 <daemon.info> woozlie boinc[24115]: [World Community Grid] Aborting task faah2516_ZINC01612632_xmd03220_01_1: exceeded CPU time limit 743.457526 Oct 21 00:43:24 <daemon.info> woozlie boinc[34269]: [World Community Grid] Aborting task dddt0101a0198_ZINC05468441-0000_02_0: exceeded CPU time limit 768.356602 Oct 21 00:43:31 <daemon.info> woozlie boinc[34269]: [World Community Grid] Aborting task dddt0101a0198_ZINC05468429-0000_01_0: exceeded CPU time limit 768.356602 Oct 21 01:00:32 <daemon.info> woozlie boinc[34269]: [World Community Grid] Aborting task dddt0101a0207_ZINC05115721-0000_02_0: exceeded CPU time limit 750.443811 Oct 21 01:02:17 <daemon.info> woozlie boinc[34269]: [World Community Grid] Aborting task faah2524_ZINC01871093_xmd03300_01_1: exceeded CPU time limit 740.859743 Oct 21 01:19:01 <daemon.info> woozlie boinc[724]: [World Community Grid] Aborting task faah2524_ZINC03899593_xmd03300_01_1: exceeded CPU time limit 740.859743 Oct 21 01:21:32 <daemon.info> woozlie boinc[724]: [World Community Grid] Aborting task dddt0101a0207_ZINC05115648-0001_01_0: exceeded CPU time limit 750.443811 Oct 21 01:34:18 <daemon.info> woozlie boinc[724]: [World Community Grid] Aborting task dddt0101a0207_ZINC05115766-0000_01_1: exceeded CPU time limit 750.443811 Oct 21 19:36:19 <daemon.info> woozlie boinc[725]: [World Community Grid] Aborting task faah2524_ZINC03954148_02_xmd03300_02_0: exceeded CPU time limit 740.859743 Oct 21 20:02:22 <daemon.info> woozlie boinc[724]: [World Community Grid] Aborting task dddt0101a0210_ZINC05148124-0000_03_1: exceeded CPU time limit 773.619105 Oct 21 20:11:57 <daemon.info> woozlie boinc[724]: [World Community Grid] Aborting task dddt0101a0210_ZINC05148158-0000_03_0: exceeded CPU time limit 773.619105 Oct 21 20:18:04 <daemon.info> woozlie boinc[724]: [World Community Grid] Aborting task dddt0101a0210_ZINC05148230-0000_04_0: exceeded CPU time limit 773.619105 Oct 21 20:27:38 <daemon.info> woozlie boinc[724]: [World Community Grid] Aborting task dddt0101a0210_ZINC05154248-0000_00_1: exceeded CPU time limit 773.619105 Oct 21 20:33:44 <daemon.info> woozlie boinc[724]: [World Community Grid] Aborting task dddt0101a0211_ZINC05089101-0000_00_1: exceeded CPU time limit 773.619105 Oct 21 20:46:26 <daemon.info> woozlie boinc[724]: [World Community Grid] Aborting task dddt0101a0210_ZINC05151870-0001_01_1: exceeded CPU time limit 773.619105 |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
The benchmark has no relevance v.v. the actual crunching. It's just a once-per-five-days performance measurement used to determine the credit claim per hour. Tasks come with an internal estimate of fpops, so the client can compute the estimated duration based on the benchmark, but regardless, it would just continue and in fact, has a in-build time out with a multiple of these estimated fpops.
----------------------------------------All jobs failing 13-14 minutes into the job is novel. Recently someone had an issue and only revealed a week after it was a server and could not boot to see if that fixed an issue.... it did.
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
The benchmark has no relevance v.v. the actual crunching. It's just a once-per-five-days performance measurement used to determine the credit claim per hour. Tasks come with an internal estimate of fpops, so the client can compute the estimated duration based on the benchmark, but regardless, it would just continue and in fact, has a in-build time out with a multiple of these estimated fpops. Thanks for the explanation. All jobs failing 13-14 minutes into the job is novel. Recently someone had an issue and only revealed a week after it was a server and could not boot to see if that fixed an issue.... it did. As this is development server, I did not hesitate to update it serveral time, turn some debugging kernel features on and off, and surely rebooted between. No help so far. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Yes, in the future please post things like this in BOINC support, or the relevant project forum. No worries, though.
Please will you post the contents of your client_state.xml file? |
||
|
|
|