| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 50
|
|
| Author |
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Cio,
----------------------------------------I'm stumped as to why 1 machine does perfect with 5.8.2 and the other on 5.8.2 does not.... it pulled 10-again and is depleting them and will deplete them fully before tomorrows lunchtime whilst buffer was set to 5.0 just for the exercise..... the DCF continues to drop, so maybe there is as tilting point, plus changing that 6.25 to 100 andhitting update might do the trick...... patience patience. U choosing WCG as your sole project is definitely the best news to a WCG-CA of the day. ciao ![]()
WCG
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 1 times, last edit by Sekerob at Jan 15, 2007 5:00:11 PM] |
||
|
|
retsof
Former Community Advisor USA Joined: Jul 31, 2005 Post Count: 6824 Status: Offline Project Badges:
|
Cio, I'm stumped as to why 1 machine does perfect with 5.8.2 and the other on 5.8.2 does not.... it pulled 10-again and is depleting them and will deplete them fully before tomorrows lunchtime whilst buffer was set to 5.0 just for the exercise..... the DCF continues to drop, so maybe there is as tilting point, plus changing that 6.25 to 100 andhitting update might do the trick...... patience patience. U choosing WCG as your sole project is definitely the best news to a WCG-CA of the day. ciao ![]() I was watching 5.7 closely when I was running it. It would go all the way to 0 before downloading anything new....so it would sit and wait until the first one was downloaded before starting to run it. That's like UD.EXE does now. I also had my buffer set to 5 and WCG processor 100%, but it did not make any difference. Get 10, run 10, get 10, run 10. What is the difference in your 2 machines? Operating system? Service Package updates?
SUPPORT ADVISOR
----------------------------------------Work+GPU i7 8700 12threads School i7 4770 8threads Default+GPU Ryzen 7 3700X 16threads Ryzen 7 3800X 16 threads Ryzen 9 3900X 24threads Home i7 3540M 4threads50% [Edit 3 times, last edit by retsof at Jan 15, 2007 5:11:06 PM] |
||
|
|
Ingleside
Veteran Cruncher Norway Joined: Nov 19, 2005 Post Count: 974 Status: Offline Project Badges:
|
By grabbing 10 wu and not getting any more before empty, it can seems like you're overcommitted... One possibility is by using too large cache-size, BOINC often doesn't like if cache-size > 1/2 deadline...
----------------------------------------V5.8.2 has many logging-capabilities, to enable detailed logging of work-request-decisions, make a text-file called cc_config.xml, and place it in main BOINC-directory. cc_config.xml should include atleast this, but can also include more flags:
After saving this file, in BOINC-Manager Advanced view, select Advanced, Read config file. With v5.8.1 and later, you do not need to stop BOINC core client any longer to enable/disable debug-flags. To disable, edit cc_config.xml again, and change the ones to zeroes, and use BOINC-Manager again to re-read config.file. If a project isn't contactable, work_fetch_debug will show this, and should also show why it's not contactable... ![]() "I make so many mistakes. But then just think of all the mistakes I don't make, although I might." [Edit 1 times, last edit by Ingleside at Jan 15, 2007 6:45:56 PM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Ok, thanks for the info! This was very helpful. I ran with the flags on and am seeing the following output:
"2007-01-15 13:51:26 [World Community Grid] [rr_sim] result faah1231_d105n643_x2BPZ_01_2 finishes after 168168.786883 (23573.056801/0.140175) 2007-01-15 13:51:26 [World Community Grid] [rr_sim] result faah1231_d105n643_x2BPZ_01_2 misses deadline by 118092.291263" Does this mean that BOINC thinks that this work unit will miss the deadline and is therefore not pulling down more work? The work unit has a deadline of Jan 20 and is about to start in a few hours so will not even come close to missing the dealine. |
||
|
|
Ingleside
Veteran Cruncher Norway Joined: Nov 19, 2005 Post Count: 974 Status: Offline Project Badges:
|
Just saw 2 'i never noticed before' values in the BOINC Default profile Use no more than: 6.25 % of processor time Use no more than: 100.0 % of total memory Hmm, if you haven't typed-it in yourself, no idea why you've got 6.25%, it's possible WCG has made an :oopsie: but default should be 100%. In any case, the full text for these preferences are: "Use at most N percent of CPU time"; Enforced by version 5.6 and greater. (default 100%) "Use at most N % of memory when computer is in use"; Enforced by version 5.8 and greater. (default 50%) "Use at most N % of memory when computer is idle"; Enforced by version 5.8 and greater. (default 90%) WCG is apparently only using #1 and #3 of these... But basically, with a setting of 6.25% CPU time, you'll crunch roughly 1 second, and be idle 9 seconds... ![]() "I make so many mistakes. But then just think of all the mistakes I don't make, although I might." |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Retsof, there's 2 differences, with before hand to comment it worked perfectly fine on 5.8.1 with a 2.5 days buffer and started depleting from the moment i upgraded to 5.8.2.
----------------------------------------1. The well function P4 single thread, versus the obstinate C2D on 2. 2. C2D only running Rosetta (5% weight) and WCG (95% weight). Rosetta suspended. The WCG DCF has now dropped to 1.3, so i'll let it sit a bit more without altering things. Ingleside, thanks for the tip. Used the CC_config.xml file before and will activate just with the one line to see what's on. The reading of the setting files without exit and restart is a great improvement. One less cause of checkpoint reverting. thanks
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
I think I'm seeing some light here. I'm running FAAH too as sole project on the resisting machine, the other one does not run FAAH, just HDC and GC. There were duration estimate changes made late last week, coinciding with me upgrading to 5.8.2.... did something go awry there?
----------------------------------------15-1-2007 20:29:57|World Community Grid|[rr_sim] result faah1235_d112n854_x2BPZ_01_1 finishes after 25726.520504 (20865.772013/0.811061) 15-1-2007 20:29:57|World Community Grid|[rr_sim] result faah1235_d112n854_x2BPZ_01_1 misses deadline by 88402.631325 15-1-2007 20:31:05||rr_simulation: end; total shortfall 629778.597587 15-1-2007 20:31:05||[work_fetch_debug] compute_work_requests(): cpu_shortfall 629778.597587, overall urgency Need Definitely the best diagnostic tip for baffled BOINC users. ![]() Update: Attached the machine to SIMAP (periodic project), and it started pulling down work without much ado, confirming the install is not thinking it's overcomitted.
WCG
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 2 times, last edit by Sekerob at Jan 15, 2007 7:41:24 PM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I am not seeing the work unit problem on my other profiles which are setup to receive work from multiple WCG projects. My "work profile" is setup to receive FAAH and human genome work. All of my machines are only running WCG projects.
Please let me know if I can collect any other diagnostic info to help figure this one out! I think WCG is a really worthy cause and am glad to help! |
||
|
|
Ingleside
Veteran Cruncher Norway Joined: Nov 19, 2005 Post Count: 974 Status: Offline Project Badges:
|
Ok, thanks for the info! This was very helpful. I ran with the flags on and am seeing the following output: "2007-01-15 13:51:26 [World Community Grid] [rr_sim] result faah1231_d105n643_x2BPZ_01_2 finishes after 168168.786883 (23573.056801/0.140175) 2007-01-15 13:51:26 [World Community Grid] [rr_sim] result faah1231_d105n643_x2BPZ_01_2 misses deadline by 118092.291263" Does this mean that BOINC thinks that this work unit will miss the deadline and is therefore not pulling down more work? The work unit has a deadline of Jan 20 and is about to start in a few hours so will not even come close to missing the dealine. Yes, as long as a project has one or more "misses deadline", BOINC will not ask the same project for more work, except if you've got an idle cpu... But, as long as expected run-time of all cached work isn't larger than cache-size, you can still ask another project for more work, as Sekerob successfully did. As for why BOINC thinks "misses deadline", not sure, but let's try... Now, 5 days = 120 hours. You're using a 2.5 days cache? this is 60 hours. 120 hours - (60 hours + 24 hours (safety)) = 36 hours till work must be finished. Also, according to rr_sim, the result will take 168168 seconds, or 46.7 hours. Since 46.7 h > 36 hours, you've missed the deadline... But, to get 46.7 hours run-time, rr_sim also used 0.140175, and I'm not quite sure how this is calculated... Hmm, a quick test, if all cpu's active and no cached work, rr_sim uses on_frac * active_frac * cpu_efficiency, so far so good... If 1 other runnable result for another project with same resource-share, switched to 1/2 and 1/2... ... So, other projects? Or low active_frac, on_frac or cpu_efficiency? These are in client_state.xml, and is close to 1 if crunches 24/7 and computer not doing any other cpu-intensive work. Anyway, just as a "fun" calculation at the end, not exactly sure if it's completely correct, but for someone trying to run with a 5-day cache, you'll likely get: 7 days deadline - (5 days + 1 day safety) = 1 day till deadline. In other words, the second expected run-time is over 1 day on client, you've being blocked from asking for more work from this project... ... Or maybe even earlier, since to be on the safe-side, client expects the estimates is a little worse to guard against wrong estimates... With a 2-day cache on the other hand: 7 days deadline - (2 days + 1 day safety) = 4 days till deadline. So, surprisingly, you'll likely have a larger cache of WCG-work if cache-size is set to 2 days, than if you set it to 5 days... ![]() "I make so many mistakes. But then just think of all the mistakes I don't make, although I might." |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Thanks for another very helpful and insightful posting! I think you have hit the root cause of the problem on my machine. This machine is a laptop that I suspend overnight and at some points over the weekend vs three of my other machines which are running 24x7.
The values in client_xml for this machine are: <on_frac>0.946552</on_frac> <connected_frac>0.093653</connected_frac> <active_frac>0.728136</active_frac> <cpu_efficiency>0.248532</cpu_efficiency> It looks like the root cause problem is the low CPU efficiency being calculated for this machine. This number looks out of sync with the completion speed of the work units. Is there any way to check how it is being calculated? |
||
|
|
|