| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 3593
|
|
| Author |
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I very nearly started another thread with a discussion on reliability, and maybe I should have. If so, I apologise.
Let me just say that I absolutely understand and agree with the scientists/techs desire to get results back quickly sometimes. That's a given! What I have a problem with is 'hidden' targets. It's bad enough that they are hidden from us, the crunchers -- at least some of us can tease them out in these discussions -- but my main issue is that they are hidden from the software that runs on our machines. Most people, even us frequent chatterers, effectively run our machines in 'set and forget' mode, just tweaking them once in a blue moon. Any targets ought to be set in a way that our machines know about and can react to. That way the targets become realisable, and are not just random and arbitrary. Far more machines will hit a target that they strive to reach than will hit a target that just sits in the air. In the long run, the grid (and the science) will perform better that way. Let's be honest, if the techs really think the current way is best in the present circumstances then perhaps it is. But it's my belief that if it's possible to set up the system to deal with such circumstances in a sensible, reactive way, then that is what should be done. Software evolves over time to meet changing circumstances, and WCG uses BOINC a little differently to other projects, so they have to be creative. But they can also feed their requirements into the future development of BOINC (or, even, do it themselves). I'm just asking that some (more) thought be given to this area. If the techs can come up with a way that uses the current abilities of the client software to better meet their goals for the science, then they should do so. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
When I read through this thread I get the impression that the dilemma is (A) we need to return ARP results as fast as possible and (B) many of us, including me, don't want to run without a work cache. My idea is to make the deadline for ARP tasks shorter than for other tasks, say five days. That way I could still have a cache of one or two days for other work but selectively trigger panic mode for ARP tasks, making them bypass the queue. The 'panic' state can be triggered automatically by setting a fake cache (aka connect every) to half the deadline of ARP. Given that the website profiles can be set to a hard number of tasks to buffer, of which the client is unaware, there will be continuous attempts to connect. One warning with that, IIRC. if ALL threads of BOINC run in high priority the work fetching stops too. BOINC development seems to be pretty much 'maintenance' and -volunteers- only, have no high expectations of anything advanced happening on both client and server side software. |
||
|
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12594 Status: Offline Project Badges:
|
DrMaason refers to using Device Profiles to control the work that he does, but that only controls the volume of WUs held in cache.
I combine that with app_config which controls the WUs actually being crunched. To allow for shortages, I usually set those limits to total 1 or 2 more than the 8 threads that I have on my PC. It is also useful to allow for high level 3 issues or RAM or any other capacity limitations. Mike |
||
|
|
uplinger
Former World Community Grid Tech Joined: May 23, 2005 Post Count: 3952 Status: Offline Project Badges:
|
Dang, this conversation got deep quickly :)
Api, to give you an idea on why we use that avg turn around is due to caches. You can set your cache to 10 days if you wanted to. This means that even though your machine is reliable, it may have the turn around of 10 days per work unit. If we set the deadline to 35% of a 10 day deadline, your machine would be putting results in higher priority more often. Now, you're still considered a reliable host, but we have a slew of results that need reliable, meaning that your queue is now filled with 10 days of results (I know extreme case, but for my point) that need to be returned in 3.5 days...you have 6.5 days of work that is going to be 'too late'. These work units would then need to be resent to another host after your deadline is missed to a machine that is reliable and your machine would now be marked as unreliable because you had results considered bad. The avg turn around was set many moons ago...when most of our applications were around the 4-6 hour range, this meant it was set for 6-9 times the avg workunit return pace. With these larger workunits that was now closer to 1.5 times a workunit length....thus needing to be tested at a higher time. There was discussion to make this setting at the application level instead of project wide, but we are experimenting with the setting change as adding that feature would take considerably more time to write and test. Changing the setting for our smaller project runtimes shouldn't affect them much, but we are watching those as well. This setting should increase the storage needed on our backend since it'll cause batches to return slower as mentioned above. What are these hidden targets you're looking for? The setting for becoming reliable wasn't purposefully hidden, but more technical then say the average user (99%) would care to know about. Are there other hidden targets you're wanting to know about? DrMason, welcome to the forums. Thanks to you and hchc for constructive conversation. Thanks, -Uplinger |
||
|
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7846 Status: Offline Project Badges:
|
Dang, this conversation got deep quickly :) That is because some of us are intensely interested and want to optimize the efficiency of our crunching. I, for one, appreciate when any of the techs chime in with additional "backend" information on any of the projects or how WCG is dealing with various issues. Thanks Uplinger Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
|
flensr
Cruncher Joined: Oct 31, 2018 Post Count: 25 Status: Offline Project Badges:
|
So then... How can I get my first WU for this project? Not one assigned WU yet, not sure what I need to do. Can't be reliable or unreliable if I can't even get one WU.
----------------------------------------![]() |
||
|
|
floyd
Cruncher Joined: May 28, 2016 Post Count: 47 Status: Offline Project Badges:
|
When I read through this thread I get the impression that the dilemma is (A) we need to return ARP results as fast as possible and (B) many of us, including me, don't want to run without a work cache. My idea is to make the deadline for ARP tasks shorter than for other tasks, say five days. That way I could still have a cache of one or two days for other work but selectively trigger panic mode for ARP tasks, making them bypass the queue. The 'panic' state can be triggered automatically by setting a fake cache (aka connect every) to half the deadline of ARP. Given that the website profiles can be set to a hard number of tasks to buffer, of which the client is unaware, there will be continuous attempts to connect. One warning with that, IIRC. if ALL threads of BOINC run in high priority the work fetching stops too. Fetching work or not, if all tasks run in high priority I missed my original goal, to process ARP tasks before any other. That's why I think ARP tasks should have shorter deadlines than others, to make them switch to high priority first.BOINC development seems to be pretty much 'maintenance' and -volunteers- only, have no high expectations of anything advanced happening on both client and server side software. There's no software changes necessary for this, just creative use of existing mechanisms as you described above. |
||
|
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7846 Status: Offline Project Badges:
|
So then... How can I get my first WU for this project? Not one assigned WU yet, not sure what I need to do. Can't be reliable or unreliable if I can't even get one WU. What kind of machine are you running and how many cores ? Right now the ARP work units are few and far between. There are several hundred thousand cores looking for just a couple of thousand work units each day. If you are patient, and I am sure you are, you will eventually get one. Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
|
gb009761
Master Cruncher Scotland Joined: Apr 6, 2005 Post Count: 3010 Status: Offline Project Badges:
|
So then... How can I get my first WU for this project? Not one assigned WU yet, not sure what I need to do. Can't be reliable or unreliable if I can't even get one WU. What kind of machine are you running and how many cores ? Right now the ARP work units are few and far between. There are several hundred thousand cores looking for just a couple of thousand work units each day. If you are patient, and I am sure you are, you will eventually get one. Cheers ![]() |
||
|
|
DCS1955
Veteran Cruncher USA Joined: May 24, 2016 Post Count: 668 Status: Offline Project Badges:
|
I am short 3 days from gold. I got most of them at 38 min after hour, prior to the fairer introduction of randomization to HSTB & ARP. I have now gone with the route of a randomized task manager. Much less of a fish in barrel situation, but fair to everyone.
----------------------------------------![]() ![]() [Edit 1 times, last edit by dcs1955 at Nov 28, 2019 4:33:49 AM] |
||
|
|
|