Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 9
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 2468 times and has 8 replies Next Thread
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2346
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Computation error (exceeded elapsed time limit - after running for 45 minutes)

I have about 150 FAAH WUs that will normally each be running for about 12 hours on my computer.
Yesterday I had 12 WUs in my queue with an estimated runtime of about 2 to 3 minutes(!).
When I came back today they were all done after 45 minutes with Status "Computation error":

2013-07-15T11:50:11 CEST | World Community Grid | Aborting task faah42749_ZINC58358216_2_xBr27_refmac2_A_PR_01_0: exceeded elapsed time limit 2689.10 (1012889.48G/376.66G)
2013-07-15T11:50:13 CEST | World Community Grid | Computation for task faah42749_ZINC58358216_2_xBr27_refmac2_A_PR_01_0 finished
2013-07-15T11:50:13 CEST | World Community Grid | Starting task faah42749_ZINC01153622_4_xBr27_refmac2_A_PR_03_0 using faah version 715 in slot 3
2013-07-15T12:40:02 CEST | World Community Grid | Aborting task faah42749_ZINC01153622_4_xBr27_refmac2_A_PR_03_0: exceeded elapsed time limit 2689.10 (1012889.48G/376.66G)
2013-07-15T12:40:03 CEST | World Community Grid | Computation for task faah42749_ZINC01153622_4_xBr27_refmac2_A_PR_03_0 finished
2013-07-15T12:40:03 CEST | World Community Grid | Starting task faah42749_ZINC32768144_1_xBr27_refmac2_A_PR_00_0 using faah version 715 in slot 3
2013-07-15T13:18:44 CEST | World Community Grid | Computation for task faah42749_ZINC02787188_1_xBr27_refmac2_A_PR_02_0 finished
2013-07-15T13:18:44 CEST | World Community Grid | Starting task faah42750_ZINC12560308_1_xBr27_refmac2_A_PR_03_0 using faah version 715 in slot 0
2013-07-15T13:29:52 CEST | World Community Grid | Aborting task faah42749_ZINC32768144_1_xBr27_refmac2_A_PR_00_0: exceeded elapsed time limit 2689.10 (1012889.48G/376.66G)
2013-07-15T13:29:54 CEST | World Community Grid | Computation for task faah42749_ZINC32768144_1_xBr27_refmac2_A_PR_00_0 finished
2013-07-15T13:29:54 CEST | World Community Grid | Starting task faah42749_ZINC12468409_1_xBr27_refmac2_A_PR_04_0 using faah version 715 in slot 3
2013-07-15T13:51:05 CEST | World Community Grid | Computation for task faah42749_ZINC44890386_1_xBr27_refmac2_A_PR_00_0 finished
2013-07-15T13:51:05 CEST | World Community Grid | Starting task faah42749_ZINC09730386_1_xBr27_refmac2_A_PR_01_0 using faah version 715 in slot 2
2013-07-15T14:08:31 CEST | World Community Grid | Aborting task faah42750_ZINC12560308_1_xBr27_refmac2_A_PR_03_0: exceeded elapsed time limit 2689.10 (1012889.48G/376.66G)
2013-07-15T14:08:33 CEST | World Community Grid | Computation for task faah42750_ZINC12560308_1_xBr27_refmac2_A_PR_03_0 finished
2013-07-15T14:08:33 CEST | World Community Grid | Starting task faah42749_ZINC20192873_1_xBr27_refmac2_A_PR_00_0 using faah version 715 in slot 0
2013-07-15T14:19:40 CEST | World Community Grid | Aborting task faah42749_ZINC12468409_1_xBr27_refmac2_A_PR_04_0: exceeded elapsed time limit 2689.10 (1012889.48G/376.66G)
2013-07-15T14:19:42 CEST | World Community Grid | Computation for task faah42749_ZINC12468409_1_xBr27_refmac2_A_PR_04_0 finished
2013-07-15T14:19:42 CEST | World Community Grid | Starting task faah42749_ZINC44890379_1_xBr27_refmac2_A_PR_02_0 using faah version 715 in slot 3
2013-07-15T14:40:54 CEST | World Community Grid | Aborting task faah42749_ZINC09730386_1_xBr27_refmac2_A_PR_01_0: exceeded elapsed time limit 2689.10 (1012889.48G/376.66G)
2013-07-15T14:40:56 CEST | World Community Grid | Computation for task faah42749_ZINC09730386_1_xBr27_refmac2_A_PR_01_0 finished
2013-07-15T14:40:56 CEST | World Community Grid | Starting task faah42749_ZINC09312898_3_xBr27_refmac2_A_PR_02_0 using faah version 715 in slot 2
2013-07-15T14:58:23 CEST | World Community Grid | Aborting task faah42749_ZINC20192873_1_xBr27_refmac2_A_PR_00_0: exceeded elapsed time limit 2689.10 (1012889.48G/376.66G)
2013-07-15T14:58:25 CEST | World Community Grid | Computation for task faah42749_ZINC20192873_1_xBr27_refmac2_A_PR_00_0 finished
2013-07-15T14:58:25 CEST | World Community Grid | Starting task faah42750_ZINC57996886_1_xBr27_refmac2_A_PR_00_0 using faah version 715 in slot 0
2013-07-15T15:09:30 CEST | World Community Grid | Aborting task faah42749_ZINC44890379_1_xBr27_refmac2_A_PR_02_0: exceeded elapsed time limit 2689.10 (1012889.48G/376.66G)
2013-07-15T15:09:32 CEST | World Community Grid | Computation for task faah42749_ZINC44890379_1_xBr27_refmac2_A_PR_02_0 finished
2013-07-15T15:09:32 CEST | World Community Grid | Starting task faah42749_ZINC32771932_2_xBr27_refmac2_A_PR_00_0 using faah version 715 in slot 3
2013-07-15T15:30:45 CEST | World Community Grid | Aborting task faah42749_ZINC09312898_3_xBr27_refmac2_A_PR_02_0: exceeded elapsed time limit 2689.10 (1012889.48G/376.66G)
2013-07-15T15:30:47 CEST | World Community Grid | Computation for task faah42749_ZINC09312898_3_xBr27_refmac2_A_PR_02_0 finished
2013-07-15T15:30:47 CEST | World Community Grid | Starting task faah42749_ZINC24957938_1_xBr27_refmac2_A_PR_01_1 using faah version 715 in slot 2
2013-07-15T15:48:14 CEST | World Community Grid | Aborting task faah42750_ZINC57996886_1_xBr27_refmac2_A_PR_00_0: exceeded elapsed time limit 2689.10 (1012889.48G/376.66G)
2013-07-15T15:48:16 CEST | World Community Grid | Computation for task faah42750_ZINC57996886_1_xBr27_refmac2_A_PR_00_0 finished
2013-07-15T15:48:16 CEST | World Community Grid | Starting task faah42751_ZINC04035963_1_xBr27_refmac2_A_PR_02_0 using faah version 715 in slot 0
2013-07-15T15:59:21 CEST | World Community Grid | Aborting task faah42749_ZINC32771932_2_xBr27_refmac2_A_PR_00_0: exceeded elapsed time limit 2689.10 (1012889.48G/376.66G)
2013-07-15T15:59:38 CEST | World Community Grid | Computation for task faah42749_ZINC32771932_2_xBr27_refmac2_A_PR_00_0 finished
2013-07-15T15:59:38 CEST | World Community Grid | Starting task faah42751_ZINC05190540_1_xBr27_refmac2_A_PR_02_0 using faah version 715 in slot 3
2013-07-15T16:20:35 CEST | World Community Grid | Aborting task faah42749_ZINC24957938_1_xBr27_refmac2_A_PR_01_1: exceeded elapsed time limit 2689.10 (1012889.48G/376.66G)
2013-07-15T16:20:37 CEST | World Community Grid | Computation for task faah42749_ZINC24957938_1_xBr27_refmac2_A_PR_01_1 finished


Normal computation of the next 4 WUs was resumed without a problem.
There is a problem coming up, though, since there are new FAAH WUs now with an estimated runtime of 2 minutes and 15 seconds(!) and there are 197 of them(!).
Before they will start, there are 16 other FAAH WUs waiting to run with an estimated runtime of 9 hours and 45 minutes and one CEP WU with an estimated runtime of 12 hours and 30 minutes.

I thought I should mention this because of the recommendation on http://boincfaq.mundayweb.com/index.php?language=1&view=190

Shall I abort all (nearly 200) of the "ready-in-2-or-3-minutes" jobs before they will run or is there someone for whom the short-lived WUs are useful information?
----------------------------------------
[Edit 1 times, last edit by adriverhoef at Jul 16, 2013 1:38:23 AM]
[Jul 16, 2013 1:29:52 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Computation error (exceeded elapsed time limit - after running for 45 minutes)

If you have client 6.10.58, this is kind of fixed in v7, since WCG has implemented <dont_use_dcf>, meaning the client wont adjust projections for the total runtime on Ready to Start work. And yes if a task is meant to run 2-3 minutes, it will 'exceed' the maximum runtime at a preset factor of 10 or 15 from the original.

Suspect, but that's only that one of several, the 'size to power' scheme blundered. It will kick in when the FAAH/SN2S android device stream goes live... maybe they're testing, but surely the techs are not telling [what's new].

Thnx for reporting.
[Jul 16, 2013 7:24:44 AM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2346
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Computation error (exceeded elapsed time limit - after running for 45 minutes)

Thanks Rob!
Maybe I should have written which version I'm using. It will become clear when I type something like this:
$ rpm -qa boinc*
boinc-client-7.0.65-1.git79b00ef.fc19.x86_64
boinc-manager-7.0.65-1.git79b00ef.fc19.x86_64


Adri
[Jul 16, 2013 8:49:37 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Computation error (exceeded elapsed time limit - after running for 45 minutes)

Yup, there's another longer go [multiple reported] bug, where the server communicates a new projected time for a task and everything else in cache adopting that. It was mostly seen during HCC when run in CPU and GPU, but I have/had this frequently on the Linux box with v7 [actually now 7.1.2x is testing]. Linux is much faster than Windows for the VINA engine based projects, but sometimes the slower Windows run time slips through and the TTC explodes the whole cache out of proportion, typically to 4.5 hours, then the next moment [after completion of usually a repair task], all drops back to 3 hours. This 2-3 minutes runs havoc. Suddenly you got a pile. And now there's the -hard coded- panic button coming in 7.2.4 and up... you can't have more than 1000 tasks by project [WCG is a project to BOINC], no matter what. There could be yet another cc_config override coming with that so at least you'll be able to get the quota that WCG limits a device to [Don't know what it is ATM]. WCG has quota ''in progress'' set, so devices going bonkers [the server really being at fault IMHO], wont overload... well yours did anyway it appears.

A true deep sigh.
[Jul 16, 2013 9:06:24 AM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2346
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Computation error (exceeded elapsed time limit - after running for 45 minutes)

Great, I have just aborted some 600 FAAH WUs with a runtime of 2-3 minutes. After doing that I *finally* got reasonable WUs trickling in again (with a runtime of about 13 hours), which should be enough for a long (and warm) weekend worth of computing. cool
[Jul 18, 2013 12:37:43 AM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2346
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Computation error (exceeded elapsed time limit - after running for 45 minutes)

In total, from 2013-07-24T12:52:19 CEST till 2013-07-29T21:57:24 CEST I have had to abort 853 FAAH WUs with an estimated runtime of about 2 minutes and 15 seconds.
That's about 160 per day.

Luckily, I also received 27 FAAH WUs (est. runtime 11 hours per task) and 1 CEP2 WU (est. 13 hours) today, as I had run out of tasks other than FAHV (of which there are some 250 in my queue right now with an est. average runtime of 1 hour).

Adri

Processor: 4 GenuineIntel Intel(R) Core(TM)2 Quad CPU Q8200 @ 2.33GHz [Family 6 Model 23 Stepping 7]
OS: Linux: 3.9.9-302.fc19.x86_64
Memory: 3.86 GB physical, 7.81 GB virtual
Disk: 47.92 GB total, 26.07 GB free

boinc-client-7.0.65-1.git79b00ef.fc19.x86_64
boinc-manager-7.0.65-1.git79b00ef.fc19.x86_64
[Jul 29, 2013 8:47:13 PM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2346
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Computation error (exceeded elapsed time limit - after running for 45 minutes)

Earlier this month I installed BOINC on a completely new system and since then I don't see any FAAH WUs with an estimated runtime of about 2 minutes anymore, as seen before and reported here.

Adri

Processor: 8 GenuineIntel Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz [Family 6 Model 60 Stepping 3]
OS: Linux: 3.11.4-201.fc19.x86_64
Memory: 7.49 GB physical, 7.51 GB virtual
Disk: 102.03 GB total, 29.08 GB free

boinc-client-7.0.65-1.git79b00ef.fc19.x86_64
boinc-manager-7.0.65-1.git79b00ef.fc19.x86_64
[Oct 23, 2013 3:51:26 PM]   Link   Report threatening or abusive post: please login first  Go to top 
kffitzgerald
Senior Cruncher
USA
Joined: Jan 29, 2011
Post Count: 222
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Computation error (exceeded elapsed time limit - after running for 45 minutes)

after reading this thread I took a look and I am running 6.10.58 on my win home server 2011 (a stripped down 2008r2 version) just WHAT version should I be running? anyone?
[Oct 23, 2013 4:36:55 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Computation error (exceeded elapsed time limit - after running for 45 minutes)

Hello kffitzgerald,
Since we are not running a GPU project, BOINC 7 is not mandatory. It seems OK, but we have not completed a security vet of the code, so it is not officially recommended. So you get your choice of official WCG BOINC 6.10.58 or BOINC 7.10.65/66. Or you can experiment with recent Beta releases.

Lawrence
[Oct 23, 2013 7:17:34 PM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread