Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 26
Posts: 26   Pages: 3   [ Previous Page | 1 2 3 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 71822 times and has 25 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: no new work

Let me take this one as example to show why the Server Abort took place (sorted by order of generation), default HCC deadline 7 days exact **:

1) This original of 2 copies was overdue at 03/03/11 06:07:58, causing the _2 copy to be send out upon ""No Reply"". It still reported at 07:08:59, about an hour late:

X0000063880291200601271046_ 0-- 608 Valid 21/02/11 06:07:58 03/03/11 07:08:29 2.91 46.4 / 42.0

2) This copy of the 2 original came back in 4 days... not swift, but no issue:

X0000063880291200601271046_ 1-- 608 Valid 21/02/11 06:08:39 25/02/11 00:44:50 3.09 37.5 / 42.0

3) This is the repair job sent out due the ""No Reply"" of 1) at 06:09:32. It was not started immediately and when 1) above still reported, albeit late, it set the flag for the Repair wingman, trusted client, to not process the task, thus "Server Abort" was signalled to this client at 11:59:55 when the host talked to the servers.

X0000063880291200601271046_ 2-- 640 Server Aborted 03/03/11 06:09:32 03/03/11 11:59:55 0.00 0.0 / 0.0

All is well, some extra intertube bandwidth was used, but no redundant crunch time went to the waist (pun intended).

** hmmm, did the standard deadline for HCC change back from 7 to 10days?

Hope this clarifies the mud a bit

--//--
----------------------------------------
[Edit 1 times, last edit by Former Member at Mar 4, 2011 3:21:07 PM]
[Mar 4, 2011 3:19:53 PM]   Link   Report threatening or abusive post: please login first  Go to top 
rilian
Veteran Cruncher
Ukraine - we rule!
Joined: Jun 17, 2007
Post Count: 1460
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: no new work

SekeRob, 32 of my cores were sitting dry for several hours.... There are only 512MB ram per 16 cores so i had only HCC ticked on and not "load work from other machines"
----------------------------------------
[Mar 4, 2011 3:20:21 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: no new work

rilian, you can trick the client a little. HCMD2 is also small as is C4CW. By limiting the permitted BOINC RAM on the host for both idle and work and setting the "if there is no work..." the clients would go to only fetch the small/lighter sciences, but not the biggies... so is the theory. I'd be interested to know if that would work by volunteering you to test this ;P

If the trick works I'll be porting this into an FAQ

--//--

PS: "load work from other machines" I've not found yet as option, but we know what you meant ;o)
----------------------------------------
[Edit 1 times, last edit by Former Member at Mar 4, 2011 3:29:16 PM]
[Mar 4, 2011 3:28:09 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: no new work

Thanks for the clarification SekeRob. Just a quick question is there any info in the result name that tell you it is a repair wu or just the fact that there is a third wu enough to tell you that it is a repair wu

cheers
[Mar 4, 2011 3:34:52 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: no new work

And a suggestion for the techs: Have the "if there is no work..." only send work when the assigned number of tasks "In Progress" sinks below or equals 1 per core. This way, upon restore of preferred work availability, those hosts would return short order to those sciences the member has elected for the host(s)/profile(s).

Would that work? Think this increases the willingness by members to select this "recommended" alternate work supply.

--//--
[Mar 4, 2011 3:36:12 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: no new work

Thanks for the clarification SekeRob. Just a quick question is there any info in the result name that tell you it is a repair wu or just the fact that there is a third wu enough to tell you that it is a repair wu

cheers

Assuming you know the standard quorum size, you can tell by the suffix and of course the 4 days or shorter ''rush''deadline. A suffix of _2 for HCC is a dead give away. For Zero Redundant sciences, those that have normally no wingman, it's less obvious as sometimes an extra wingman is send out at the same time, so I'd say the shorter deadline is the more reliable indicator.

Because I run with a near 2 day cache, remaining reliable, yet receiving repair jobs and without pushing them in the queue, the Slow boat to China cruncher has a little bit more grace time to complete the task and the servers sending me hosts an "it's redundant, don't bother, server abort''

Much preferring the BOINCTasks utility because it also has the assignment date column which the standard BOINC Manager does not. This way it's more obvious what short deadline tasks are, to me.

--//--
[Mar 4, 2011 3:47:46 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: no new work

rilian, you can trick the client a little. HCMD2 is also small as is C4CW. By limiting the permitted BOINC RAM on the host for both idle and work and setting the "if there is no work..." the clients would go to only fetch the small/lighter sciences, but not the biggies... so is the theory. I'd be interested to know if that would work by volunteering you to test this ;P

If the trick works I'll be porting this into an FAQ

--//--

PS: "load work from other machines" I've not found yet as option, but we know what you meant ;o)

Testing this idea, indeed it works. Took my duo and reduced work/idle memory to 5% which set it down to 102Mb RAM use allowance. Selected HFCC & DDDT2 and the "If there is no work...", then upped the cache. The log below messages that not enough memory was assigned for HFCC & DDDT2 (simultaneous revealing the difference between the System Requirement spec pages and the hard RAM minimum). Only the small footprint alternate sciences were fetched (HCC/HCMD2.

11580 05-03-2011 13:47 Preferences:
11581 05-03-2011 13:47 max memory usage when active: 102.31MB
11582 05-03-2011 13:47 max memory usage when idle: 102.31MB
11583 05-03-2011 13:47 max disk usage: 10.00GB
11584 05-03-2011 13:47 (to change preferences, visit the web site of an attached project, or select Preferences in the Manager)
11585 WCG 05-03-2011 13:47 [sched_op] Starting scheduler request
11586 WCG 05-03-2011 13:47 Sending scheduler request: To fetch work.
11587 WCG 05-03-2011 13:47 Requesting new tasks for CPU
11588 WCG 05-03-2011 13:47 [sched_op] CPU work request: 31872.23 seconds; 0.00 CPUs
11589 WCG 05-03-2011 13:47 Scheduler request completed: got 3 new tasks
11590 WCG 05-03-2011 13:47 [sched_op] Server version 601
11591 WCG 05-03-2011 13:47 No work can be sent for the applications you have selected
11592 WCG 05-03-2011 13:47 No work is available for Discovering Dengue Drugs - Together - Phase 2 (Type A)
11593 WCG 05-03-2011 13:47 Help Fight Childhood Cancer needs 119.21 MB RAM but only 102.31 MB is available for use.
11594 WCG 05-03-2011 13:47 Discovering Dengue Drugs - Together - Phase 2 needs 750.00 MB RAM but only 102.31 MB is available for use.
11595 WCG 05-03-2011 13:47 You have selected to receive work from other applications if no work is available for the applications you selected
11596 WCG 05-03-2011 13:47 Sending work from other applications
11597 WCG 05-03-2011 13:47 Project requested delay of 11 seconds
11598 WCG 05-03-2011 13:47 [sched_op] estimated total CPU task duration: 50682 seconds
11599 WCG 05-03-2011 13:47 [sched_op] Deferring communication for 11 sec
11600 WCG 05-03-2011 13:47 [sched_op] Reason: requested by project
11601 WCG 05-03-2011 13:47 Started download of wcg_hcc1_img_6.40_windows_intelx86
11602 WCG 05-03-2011 13:47 Started download of wcg_hcc1_img_graphics_6.40_windows_intelx86
11603 WCG 05-03-2011 13:47 Finished download of wcg_hcc1_img_graphics_6.40_windows_intelx86
11604 WCG 05-03-2011 13:47 Started download of hcc1_image04_6.40.tga
11605 WCG 05-03-2011 13:47 Finished download of wcg_hcc1_img_6.40_windows_intelx86
11606 WCG 05-03-2011 13:47 Finished download of hcc1_image04_6.40.tga
11607 WCG 05-03-2011 13:47 Started download of hcc1_image03_6.40.tga
11608 WCG 05-03-2011 13:47 Started download of hcc1_image02_6.40.tga
11609 WCG 05-03-2011 13:48 Finished download of hcc1_image03_6.40.tga
11610 WCG 05-03-2011 13:48 Finished download of hcc1_image02_6.40.tga
11611 WCG 05-03-2011 13:48 Started download of hcc1_image01_6.40.tga
11612 WCG 05-03-2011 13:48 Started download of X0000065240731200602242024_X0000065240731200602242024.jp2
11613 WCG 05-03-2011 13:48 Finished download of hcc1_image01_6.40.tga
11614 WCG 05-03-2011 13:48 Finished download of X0000065240731200602242024_X0000065240731200602242024.jp2
11615 WCG 05-03-2011 13:48 Started download of X0000065240427200602242029_X0000065240427200602242029.jp2
11616 WCG 05-03-2011 13:48 Started download of 583efc9bc28523c3f2e0a9647b3b8936.dat.gzb
11617 WCG 05-03-2011 13:48 Finished download of X0000065240427200602242029_X0000065240427200602242029.jp2
11618 WCG 05-03-2011 13:48 Finished download of 583efc9bc28523c3f2e0a9647b3b8936.dat.gzb
11619 WCG 05-03-2011 13:48 Started download of cbfbb23e5ae6f9c81628dad4bab38e8d.dat.gzb
11620 WCG 05-03-2011 13:48 Started download of 5be7af669ed63f33b54771c812958e27.pdb.gzb
11621 WCG 05-03-2011 13:48 Finished download of cbfbb23e5ae6f9c81628dad4bab38e8d.dat.gzb
11622 WCG 05-03-2011 13:48 Finished download of 5be7af669ed63f33b54771c812958e27.pdb.gzb
11623 WCG 05-03-2011 13:48 Started download of 93f4e8307bf057ebd259191837a37a6c.pdb.gzb
11624 WCG 05-03-2011 13:48 Started download of b416fac6f940515859406b3d7fb2f4dd.dat.gzb
11625 WCG 05-03-2011 13:48 Finished download of 93f4e8307bf057ebd259191837a37a6c.pdb.gzb
11626 WCG 05-03-2011 13:48 Finished download of b416fac6f940515859406b3d7fb2f4dd.dat.gzb

In the interim to learn what the hard limit is for other sciences expanded the science selection and got this:

11675 WCG 05-03-2011 14:16 The Clean Energy Project - Phase 2 needs 750.00 MB RAM but only 102.31 MB is available for use.
11676 WCG 05-03-2011 14:16 Help Fight Childhood Cancer needs 119.21 MB RAM but only 102.31 MB is available for use.
11677 WCG 05-03-2011 14:16 Computing for Clean Water needs 384.00 MB RAM but only 102.31 MB is available for use.
11678 WCG 05-03-2011 14:16 Human Proteome Folding - Phase 2 needs 171.66 MB RAM but only 102.31 MB is available for use.
11679 WCG 05-03-2011 14:16 FightAIDS@Home needs 119.21 MB RAM but only 102.31 MB is available for use.
11680 WCG 05-03-2011 14:16 You have selected to receive work from other applications if no work is available for the applications you selected

So effectively, by controlling memory permission to 118MB and selecting HCC plus "If there is no work..." the control is to only receive HCMD2 as alternate. Set it to 380 MB and only medical sciences would be received, regrettably then DDDT2 not coming along, which chance is currently anyway slim.

Now that the exact minima were printed for most sciences, have to see if these sciences will run concurrent in any pairing without invoking a "waiting for memory" state, the weakness in the strategy I fear.

ttyl
[Mar 5, 2011 1:24:54 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: no new work

PS, The HCC temporary shortage knocked onto Saturday validations... some 35,000 down from Friday, so is the projection based on morning data. All other sciences point up, but that could be an after effect of the server troubles of past days... delayed returns and validations.

--//--
[Mar 5, 2011 1:27:57 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Ingleside
Veteran Cruncher
Norway
Joined: Nov 19, 2005
Post Count: 974
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: no new work

Now that the exact minima were printed for most sciences, have to see if these sciences will run concurrent in any pairing without invoking a "waiting for memory" state, the weakness in the strategy I fear.

Yes, with low memory-settings you'll successfully download the work, but you can't run on all the cores, since not enough memory for this. If rillian has a 16-core system like his message atleast indicates, would expect he'll need to set memory to atleast 1 GB, even if runs the smallest-memory-WCG-project, just to keep all cores loaded.

Also, if starts to hit the memory-limit, there's a chance some tasks will be removed from memory, regardless of LAIM being on, something that can lose significant amount of time if long time between checkpoints.
----------------------------------------


"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."
[Mar 5, 2011 2:06:36 PM]   Link   Report threatening or abusive post: please login first  Go to top 
sk..
Master Cruncher
http://s17.rimg.info/ccb5d62bd3e856cc0d1df9b0ee2f7f6a.gif
Joined: Mar 22, 2007
Post Count: 2324
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: no new work

I would put these problems down to the server troubles. I ran out of WCG tasks on several systems and for some time. Typically Boinc either got no new work (left me dry), backed off trying to download and got resends (which also backed off) for lost tasks. It might also have messed with Boinc as some other projects were not automatically downloading at times. Upping the cache and manual updates got there for the other projects but not for WCG tasks (varying multiples of projects selected on different systems).
[Mar 5, 2011 2:16:46 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 26   Pages: 3   [ Previous Page | 1 2 3 | Next Page ]
[ Jump to Last Post ]
Post new Thread