| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 8
|
|
| Author |
|
|
mreuter80
Advanced Cruncher Joined: Oct 2, 2006 Post Count: 83 Status: Offline Project Badges:
|
Hey team,
----------------------------------------I saw on one of the WUs I had the following (bold is from my computer): E200925_ 880_ A.26.C20H11NS3Se2.26.1.set1d06_ 3-- 635 Valid 1/17/11 00:23:19 1/17/11 12:04:02 8.03 229.7 / 120.5 Looking at my wingman's runtime something seems to be wrong. After checking his status information I found out that only job 0 finished and job 1 exited with an error. All other jobs were skipped. I know that the WU don't need to finish the entire set of 16 jobs to be valid, however, if there is such a discrepancy (especially if there is an error in the WU) I think something doesn't work properly. Also, if only job 0 will be used for analysis I think I have wasted 8 hours of precious energy. Why do we actually have a quorum of 2 if it allows such differences? Maybe this was just a freak accident, but maybe someone can have a look at it. Cheers [Edit 1 times, last edit by mreuter80 at Jan 17, 2011 6:22:57 PM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hello mreuter80,
Why do we actually have a quorum of 2 if it allows such differences? Only the project scientists know just what (and why) the validation criteria are currently. But apparently, if a computer has a record of returning valid units, and if another computer validates at least one job from that computer, then they are willing to accept the results of the cpu that returns a number of jobs. So they appear to demand at least a little validation (in order to avoid excessive gullibility) but they are trying not to waste cpu time. Doesn't bother me at all!Lawrence |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Dear mreuter80,
the validation is performed on a job that was finished by both wingmen - in this case job0. If validation at that point is positive, it is unlikely that the rest of the results differ, hence we are satisfied with that. You are right that it would be more efficient to not run the validation copy all the way to conclusion, since the results are not used anyways. We had a detailed discussion on this issue with the IBM team but it seems that it's technically not feasible at this point. So we have to live with it. Best wishes from Your Harvard CEP team |
||
|
|
mreuter80
Advanced Cruncher Joined: Oct 2, 2006 Post Count: 83 Status: Offline Project Badges:
|
Thanks for the quick answers. I'm still not clear on one thing. Is the research team using then only the result from job 0 or the results from all 16 jobs? Or, will there be another WU send out that contains only the jobs 1 - 15?
|
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
snip
You are right that it would be more efficient to not run the validation copy all the way to conclusion, since the results are not used anyways. We had a detailed discussion on this issue with the IBM team but it seems that it's technically not feasible at this point. So we have to live with it. Presume here the --validation-- copy is the one meant as running 0.07 hours. I'm unsure what exact loss is determined here in significance if the tasks just cycles to the end in a matter of a minute once an end (for instance the RC=x100) or kill condition is encountered. thx. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi mreuter80,
Sorry. I should have stated it explicitly. All the jobs returned up to job 15 (the 16th), are accepted if a wingman validates at least the first job (job 0).Lawrence |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi SekeRob et al.,
these special cases are just not easy to implement in BOINC/WCG. But there is some good news: This thread has triggered a new idea on how to do the validation which would basically eliminate the waste of the current setup . It may take a month or two to implement, but it is high up on the agenda. We'll keep you posted on how this is going.Best wishes from Your Harvard CEP team |
||
|
|
mreuter80
Advanced Cruncher Joined: Oct 2, 2006 Post Count: 83 Status: Offline Project Badges:
|
Thanks all again for the quick answers.
I'm glad my resources were not wasted and my stupid question eventually triggered some new idea Please keep us posted about the "new idea". I will close the thread as solved. Cheers! |
||
|
|
|