Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Completed Research Forum: FightAIDS@Home Phase 2 Thread: FightAIDS@Home - Phase 2 AsyncRE WU Limit |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 9
|
Author |
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Is it time to relax the one-per-core limit for these yet?
Everything seems to be going really well and I, for one, would like to see some relaxation in the one-per-core limit now. Even ncpus+1 would help ... Thanks. |
||
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 868 Status: Offline Project Badges: |
It would be quite nice to be able to get a bigger feed of these tasks, but as fast turn-around is important I'm not sure about whether a significant change in feeding would be a good idea.
----------------------------------------About 8% of the 50000 step jobs I'm getting are resends because the first recipient has timed out, Sometimes, the first retry has also timed out (so that's a 48-hour delay)! And I'm even seeing about 1% resends-on-timeout for the 10000 step jobs, so some folks probably have quite large buffers or are running less than 24/7 and even the one-per-core limitation isn't working out... On the larger jobs, "timed out" has passed "Error - a required privilege is not held" as the major source of retries that I'm receiving! As there are now (it seems) two identical sets of tasks, one all 10000-step, the other all 50000-step, it might be interesting to find out how many retries are needed for each size of task! If, as I suspect, the number of timed out tasks has risen significantly for the longer tasks, I don't know how they can strike a fair balance between keeping faster crunchers fed and avoiding larger numbers of 24-hour delays because of time-outs. [Edit 1 times, last edit by alanb1951 at Jan 4, 2018 2:03:05 AM] |
||
|
Aurum
Master Cruncher The Great Basin Joined: Dec 24, 2017 Post Count: 2384 Status: Offline Project Badges: |
I just counted the number of FAH2 WUs running and it's one per thread NOT one per core. What is this "one-per-core limit???"
----------------------------------------...KRI please cancel all shadow-banning |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Can guess the reason to the one-per-core-thread plus 1 quota, but explaining the thinking to the less 'in' would help to make it understood to all.. Certainly would recommend to set the <report_result_immediately/> flag in app_config.xml for fahb to get the swiftest backfill, if so desired. Does require the newest client, i.e.not the WCG skinned.
----------------------------------------Easy to overcome BTW, just fake having ncpus+1 in cc_config.xml, then set app_config.xml to max_concurrent with the true thread count, if only doing this science. Of course in time the 1 extra will have aged about half of the average runtime before starting, i.e. Not a good idea for slower devices. [Edit 1 times, last edit by Former Member at Jan 4, 2018 12:10:08 PM] |
||
|
Trotador
Senior Cruncher Joined: Mar 26, 2009 Post Count: 154 Status: Offline Project Badges: |
Actually, there is a maximum of 64 units per host regardless the available threads that I'm experiencing in my hosts with 88 and 72 threads. So, I'm not sure whether the ncpus trick will work or not.
---------------------------------------- |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
If 64 is the observed hard limit per client, then no amount of tricking will get you more, but to set up 2 concurrent clients
|
||
|
cjslman
Master Cruncher Mexico Joined: Nov 23, 2004 Post Count: 2082 Status: Offline Project Badges: |
Sorry if this has been answered some place else, but does anybody know why there is a one-per-core limit? (I asked this in another thread, but didn't get an answer). Just curious.
----------------------------------------Thanks, CJSL Crunching like it's going out off style... |
||
|
pcwr
Ace Cruncher England Joined: Sep 17, 2005 Post Count: 10903 Status: Offline Project Badges: |
Currently have 1 per core +1 of WUs since the update.
----------------------------------------Patrick |
||
|
hchc
Veteran Cruncher USA Joined: Aug 15, 2006 Post Count: 746 Status: Offline Project Badges: |
Currently have 1 per core +1 of WUs since the update. Patrick Since what update? I'm only seeing 1 per core still.
|
||
|
|