Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 4
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 827 times and has 3 replies Next Thread
philperkins
Cruncher
Joined: Nov 9, 2005
Post Count: 11
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
World Community Grid | Tasks won't finish in time: BOINC runs 99.9%% of the time; computation is enabled 100.0%% of that

Please can someone advise on why I get this friendly message and how to fix it? The 99.9% may vary and the worst seen to date is 99.6%.

This occurs on two similar PCs that are virtually devoted to WCG. The inconvenience is that I have to regularly check the expiry times of the work, and abort 20 to 80 tasks, if necessary. It would appear that if I miss this by a short time, the late-finished work finishes up as awaiting either verification or validation but I have not tracked any for final possible validity.

Both PCs use SuperMicro X8DTN+ mobos with dual hex-core CPUs (2 x X5679 and 2 x X5675). All Video cards are EVGA/NVIDIA; the 5675 PC has a 640 + 660 and the other has 2 x 650ti. All run on PCI-E x8 buses as x16 is not available. NVIDIA and EVGA opinions vary on this thought but the 640:650ti:660 ratio is around 30:15:9 in practice, in worktime per dual units.

The 5679 system runs only HCC so there are 22 CPUs and 2 GPUs working; similar on the 5675 system except that it runs CPU HFCC and GPU HCC. I note that recently some of the HFCC tasks take roughly twice the time of others ????

Despite some other reports, CPU usage is 100% but only GPU usage drops out at the 49.707/50.000% and 99.707/100.000% times. I would like to double the CPU work but, as far as I can see, the config file for that will trash the CPU work?

BOINC version is 7.0.28 and to avoid the repeat of "no GPU HCC work" I have a 7 day work buffer.

The 5675 PC seems to have inherited a brain from somewhere because, despite the %% message, it is actually working on the work that times out around a day later but it's too early to say if this will last.

Comments from similar users / fix for the % report / config to try 20 CPU + 4 GPU would be greatly appreciated.

Best Regards to all - Phil
[Jan 8, 2013 1:13:58 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: World Community Grid | Tasks won't finish in time: BOINC runs 99.9%% of the time; computation is enabled 100.0%% of that

You gave the very info for a quick diagnostic [a rarity in the "got a problem" department]. Deadline for HCC is 7 days, you run 7 days cache/buffer, so you will at times see this message and not get work [Overcached to the extreme]. Fix: Reduce the buffer so the Minimum is e.g. 5.0 days [if you're worried to run out] and max additional buffer to 1.0, but *never* should the sum of these 2 parms exceed, say 80% of the shortest deadline your client receives tasks for. Why 80%? Becaaause, the runtimes vary and the total work buffer duration is recomputed each time a task finishes... got a long one, BOINC assumes all that come after also take that long. With 7 days it suddenly could become 8-9-10 days and complete panic state is invoked by BOINC [Earliest Deadline First]. **

Remember, soon as a task goes overdue, a new copy is send out to some other host i.e. your copy is technically already redundant. When that other host completes the task quickly, your client gets a message to auto-abort those long as not started. If they have started, you get that red liner "consider to abort as credit is unlikely" (to that effect). Personally, I think BOINC could have a "never start overdue or bound to go overdue". Can't remember why that function is not active, but since some projects are perfectly happy to accept overdue tasks [CPDN e.g.], it could be one of the reasons why it's not general.

BTW, WCG maintains an "in-progress" limit for a host. Can't remember the numbers but think it was like ~4000 for GPU and a few hundred for CPU, one measure to stop clients to go nuts on caching work [which officially is 10 days total per processor anyhow].

Not exhaustive, more the general outline.

** P.S. devices that have very little uptime could also receive your message. E.g. task takes 10 hours, device is only on 1 hour a day. Then the percentages are much wider apart.
----------------------------------------
[Edit 1 times, last edit by Former Member at Jan 8, 2013 2:52:13 PM]
[Jan 8, 2013 2:48:00 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7849
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: World Community Grid | Tasks won't finish in time: BOINC runs 99.9%% of the time; computation is enabled 100.0%% of that

BOINC version is 7.0.28 and to avoid the repeat of "no GPU HCC work" I have a 7 day work buffer.

I think part of the panic state mode to which you refer is due to the high buffer for HCC. This project has a 7 day turnaround time so you are at the max for this. If you are worried about running out of work I would suggest shortening this buffer to perhaps 4 or 5 days which should still be sufficient to get by any momentary shortages or outages but still avoid having BOINC go into panic mode. Left alone BOINC will eventually adjust to avoid the panic mode, but keep you well supplied.
Nanoprobe seems to be the expert on the GPU processing so perhaps he could offer some advice on the optimum BOINC version, app_info file and anything else he thinks is relevant to your GPU setup.
HFCC units have varied widely in duration for me. Some molecules are just more complex and take longer to process. This is not a problem with your machine.
Hope this helps.
Cheers

Edit: Sekerob types faster than I do and his explanation is more complete. smile
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
----------------------------------------
[Edit 2 times, last edit by Sgt.Joe at Jan 8, 2013 2:58:52 PM]
[Jan 8, 2013 2:55:17 PM]   Link   Report threatening or abusive post: please login first  Go to top 
philperkins
Cruncher
Joined: Nov 9, 2005
Post Count: 11
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: World Community Grid | Tasks won't finish in time: BOINC runs 99.9%% of the time; computation is enabled 100.0%% of that

Thank you Rob and Sgt. Joe for such fantastic and quick replies. This service beats any UK helpline. I've now set them both to 5 days + 1 day so hopefully I can soon leave them to do their own things in peace.

It's ironic that I included every bit of information possible yet it was the single number 7 that was the key to my problem. I was envisaging some deep down hardware problem.

Agreed, Sekerob, about the "never start overdue or bound to go overdue" - it would be a great feature.

Thank you both for sorting out the big problem for me. I get the feeling, from reading many posts, that 20CPU/4GPU is possibly not even possible.

A Very grateful Phil
[Jan 8, 2013 4:14:56 PM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread