Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Beta Testing Forum: Beta Test Support Forum Thread: DDD2 Type B work units going out. |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 369
|
Author |
|
Hypernova
Master Cruncher Audaces Fortuna Juvat ! Vaud - Switzerland Joined: Dec 16, 2008 Post Count: 1908 Status: Offline Project Badges: |
Now I understand what happened.
----------------------------------------My two wingmen validated faster then me so the quorum was reached and I was %&@#*% Now the problem is that I have a 1 day buffer, so when a beta comes it is queued and processed a day after. With this logic I will be %&*@#& everytime. A question then to calm me down. I understand here we are in a speed race. Ok then. When a Beta WU is received how can I select it from the bottom of my queue, suspend and existing one and put the beta at work. Is this possible or the only solution would it be to set up my buffers to a very short periods risking to have devices become idle for lack of WU due to com problems, WCG server problems etc.. Thank's for some urgent help here. My Beta situation is in danger. |
||
|
Ingleside
Veteran Cruncher Norway Joined: Nov 19, 2005 Post Count: 974 Status: Offline Project Badges: |
Thanks. Next time I will suspend and then, immediately resume! O/T @Boinc developers and the WCG Techs It was no more User error than a Boinc/WCG failure to communicate! There was no warning message to say that WCG communication would be stopped after suspending the task: No Text Tip, and nothing in Messages; it just said Task Suspended By User - no mention of communications being suspended. Communications aren't suspended, only work-request. It is unreasonable to presume that suspending one single task should inherently lead a user to think that all work would be stopped because networking would be stopped. If I wanted stop networking I can do this and if I want to suspend all WCG tasks I would naturally do this under Projects rather than Tasks. The bottom line is that the change I made (suspending a single task) also made a unintuitive change that resulted in the WCG losing half a days work. It is an unwanted and unhelpful Feature! This behaviour is mainly a safety-feature, to guard against client becoming overloaded with work it has no hope of returning by the deadline. Since client has no idea of why user is suspending tasks, continuing asking for more work is dangerous, since if user resumes the paused tasks client can be hopelessly overcommited. If the suspension is due to something that looks like a problem with tasks, but later after checking the forums shows isn't really a problem, client can again be hopelessly overcommitted. Also, before this feature was added, more than once users had forgotten to resume tasks... There's also some other reasons for the addition of this feature, some of them aren't so important any longer due to other features being added later on. Still, two that is still in effect is to make it harder for users "cherry-picking", or users just continuing downloading work for afterwards just ditching it. "I make so many mistakes. But then just think of all the mistakes I don't make, although I might." [Edit 1 times, last edit by Ingleside at Jan 16, 2010 6:03:42 PM] |
||
|
Ingleside
Veteran Cruncher Norway Joined: Nov 19, 2005 Post Count: 974 Status: Offline Project Badges: |
A question then to calm me down. I understand here we are in a speed race. Ok then. When a Beta WU is received how can I select it from the bottom of my queue, suspend and existing one and put the beta at work. Is this possible or the only solution would it be to set up my buffers to a very short periods risking to have devices become idle for lack of WU due to com problems, WCG server problems etc.. Just to make it general, in case you're also running other projects, the "best" method is generally: 1: Suspend network. 2: Suspend other BOINC-projects, except for any GPU-only projects. 3: Select all unstarted tasks between the particular tasks you want to run, and suspend them. 4: Just to be on the safe side, it can also be an idea to suspend any other unstarted tasks besides the tasks you wants to run. 5: Suspend one or more of your currently-running tasks, so the particular tasks starts instead. 6: When all the tasks you wants to run is started, resume any other tasks in project. Only if you're overcommitted can this be a problem. 7: Resume any other BOINC-projects. 8: Enable network again. The reason to not just suspend the running tasks is so you don't start a bunch of tasks before you comes to the tasks you really wants to run. edit - changed #2 a little. "I make so many mistakes. But then just think of all the mistakes I don't make, although I might." [Edit 1 times, last edit by Ingleside at Jan 16, 2010 6:19:00 PM] |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Now I understand what happened. My two wingmen validated faster then me so the quorum was reached and I was %&@#*% Now the problem is that I have a 1 day buffer, so when a beta comes it is queued and processed a day after. With this logic I will be %&*@#& everytime. A question then to calm me down. I understand here we are in a speed race. Ok then. When a Beta WU is received how can I select it from the bottom of my queue, suspend and existing one and put the beta at work. Is this possible or the only solution would it be to set up my buffers to a very short periods risking to have devices become idle for lack of WU due to com problems, WCG server problems etc.. Thank's for some urgent help here. My Beta situation is in danger. The order to push jobs, assuming your tasks 1) suspend ready to run tasks from below the running tasks till the one before the one you wish to run. 2) suspend a running task, as many as needed to force start of 1 or more priority selected jobs. 3) Resume the suspended task(s) **, the bit that skgiven did not [forgot] to do until it was too late and his client idled. There's different ways to do this, but I'll leave the members to write it up, and if glossy enough I'll copy-post it into the FAQ's edit: ** The resume will set the interrupted tasks to "waiting to run" state.
WCG Global & Research > Make Proposal Help: Start Here!
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 2 times, last edit by Sekerob at Jan 16, 2010 6:20:17 PM] |
||
|
Hypernova
Master Cruncher Audaces Fortuna Juvat ! Vaud - Switzerland Joined: Dec 16, 2008 Post Count: 1908 Status: Offline Project Badges: |
I undersand the point regarding the Suspend and Resume.
----------------------------------------I understand also the cherry picking issue. Ok. Fine. Agreed. But the latest Beta I have received, not the one that was server aborted but a new one is given with 15 hours CPU time. If the task gets Server Aborted after 10 hours of running then it may be a little unfair, not to me that I have enough CPU's and cores (even if I am not happy with it), but for those who have smaller systems, laptops etc. were they can be taken for long time on one task and then got shot away with 0 points. Imagine then what happens for a Beta with 100 hours CPU time. Then when one is too late you should reward him at least for the time he crunched before being aborted by the server. Maybe it is already so. |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Beta's or regular jobs processed and returned too late get credit for points/time/results. There's an unofficial grace period for No Reply tasks, until one day too many start planning with grace periods... it should be the exception, not the rule!
----------------------------------------edit: and it seems my metaphor missed to hit.... when rounds were already fired, as in the job started, there is no mission abort, unless the jobs are known to be bad.
WCG Global & Research > Make Proposal Help: Start Here!
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 1 times, last edit by Sekerob at Jan 16, 2010 6:25:33 PM] |
||
|
Hypernova
Master Cruncher Audaces Fortuna Juvat ! Vaud - Switzerland Joined: Dec 16, 2008 Post Count: 1908 Status: Offline Project Badges: |
Thank's Sekerob. You deserved an excellent italian
----------------------------------------Now I am again a happy Beta cruncher. |
||
|
Ingleside
Veteran Cruncher Norway Joined: Nov 19, 2005 Post Count: 974 Status: Offline Project Badges: |
I undersand the point regarding the Suspend and Resume. I understand also the cherry picking issue. Ok. Fine. Agreed. But the latest Beta I have received, not the one that was server aborted but a new one is given with 15 hours CPU time. If the task gets Server Aborted after 10 hours of running then it may be a little unfair, not to me that I have enough CPU's and cores (even if I am not happy with it), but for those who have smaller systems, laptops etc. were they can be taken for long time on one task and then got shot away with 0 points. Imagine then what happens for a Beta with 100 hours CPU time. Then when one is too late you should reward him at least for the time he crunched before being aborted by the server. Maybe it is already so. This isn't a problem, since there's two kinds of server-aborts, the normal one won't abort started tasks. There's also an unconditional abort, but this is only used in case wu has errored-out, project has manually cancelled the wu, or it's so long after deadline that can't be validated anyway. edit - slow typer... "I make so many mistakes. But then just think of all the mistakes I don't make, although I might." [Edit 1 times, last edit by Ingleside at Jan 16, 2010 6:33:41 PM] |
||
|
sk..
Master Cruncher http://s17.rimg.info/ccb5d62bd3e856cc0d1df9b0ee2f7f6a.gif Joined: Mar 22, 2007 Post Count: 2324 Status: Offline Project Badges: |
Sek and Ingleside, thanks for your explanations. It is still a learning curve for me, and might be for some time. I hope these posts will help others too.
Sek, keep at them. Ingleside, as you say, Communications are not suspended but work-requests are. I would add Reporting to that, as the tasks did not get reported until I did this manually. I still don't understand why task reporting is not performed at the same as uploading Such matters are only Beta related, suspending a task is far from a normal procedure. Even with Beta tasks I would normally resume, but it was late and I had no idea of the dire consequences |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
The result file uploads, part 1 of the process and the Ready to Report aka RtR, part 2 of the process, are held in different parts of the DB. The RtR causes material server scheduling activity. If it were immediate with the result files, presently 450,000+ times a day, very high load, thus for efficiency sake these RtR's are held up to 24 hours, so to combine with for instance a work request that affects the same Server Scheduler. A combined RtR + work request halves the load. There are more client scheduler conditions when a RtR gets reported btw... 7 or 8. For that see the famous FAQ's.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All! |
||
|
|