| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 6
|
|
| Author |
|
|
Dirk Gently
Senior Cruncher England Joined: Mar 1, 2005 Post Count: 153 Status: Offline Project Badges:
|
I've just read the announcement from knreed in Member News, about the policy change in how workunit redundency is handled.
----------------------------------------It all seems sensible to me. It is good to see an increase in grid efficiency and yield. I have the following questions:- 1) Why can't we go even further and only seend a workunit to just 2 computers, seeking a third result in the event of a discrepancy? The chances of 2 workunits sent to randomly selected computers being in error or tampered with in exactly the same way must be very slim. 2) Some time ago I set BOINC to the "Leave in Memory when preempted" option on advice from WCG. This was because of a bug which was likely to cause an error. Is this bug fixed yet in the latest BOINC, or was it to do with project software? If it is the fault of project software, does it affect both HPF and FAAH? 3) I presume that it is possible for the results of 2 returned units to differ even though they were both successfully completed. Is it possible for us to know whether each of our returned workunits matched the quorum or not? Is this what "Valid" or "Invalid" tells us? What is "Inconclusive"? Feedback about our machine stability and processing reliability is very usefull. The grid also benefits because crunchers get to know that their machines are bad and are likely to do something about it. What about having a "rank" for machines, based on results history? ---------------------------------------- [Edit 2 times, last edit by Dirk Gently at May 27, 2006 1:38:11 AM] |
||
|
|
retsof
Former Community Advisor USA Joined: Jul 31, 2005 Post Count: 6824 Status: Offline Project Badges:
|
It never hurts to have a tiebreaker, which requires an odd number of units. If there are 2 and they are different, which one is correct? Both would have to be sent out again. Could 2 be the same and both wrong?
----------------------------------------I have seen other projects which have a quorum of 2....or even 1 for very short workunits. For the length and complexity of what we've got, 3 sounds like a reasonable compromise, with a 4th if needed. I just downloaded the latest version 5.4.9 of BOINC, which is working fine on several projects and has some improved statistics handling. Go for it. It's an easy drop-in replace. Remember to exit the existing BOINC execution before running the reinstall.
SUPPORT ADVISOR
----------------------------------------Work+GPU i7 8700 12threads School i7 4770 8threads Default+GPU Ryzen 7 3700X 16threads Ryzen 7 3800X 16 threads Ryzen 9 3900X 24threads Home i7 3540M 4threads50% [Edit 4 times, last edit by retsof at May 27, 2006 2:11:23 AM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hello Dirk Gently,
2) Some time ago I set BOINC to the "Leave in Memory when preempted" option on advice from WCG. This was because of a bug which was likely to cause an error. Is this bug fixed yet in the latest BOINC, or was it to do with project software? If it is the fault of project software, does it affect both HPF and FAAH? This bug has been fixed by calling TerminateProcess() rather than by calling Exit() in the application. There was a race condition between threads that would sometimes cause problems. BOINC 5.4.9 now will wait until the application is properly terminated before proceeding. As far as I know, there is no reason to select 'Leave in Memory . . .' and plenty of reasons (memory capacity especially) to NOT select it. 3) I presume that it is possible for the results of 2 returned units to differ even though they were both successfully completed. Is it possible for us to know whether each of our returned workunits matched the quorum or not? Is this what "Valid" or "Invalid" tells us? What is Inconclusive"? The status definitions are here: http://www.worldcommunitygrid.org/forums/wcg/viewthread?thread=6105 Since we are using the Quorum of Three method, there must be at least 3 identical results before we define that to be the Valid result. Therefore, if your result is Valid, it matches. If it is Invalid, it does not match. Inconclusive means that we got back at least 3 results, but could not choose a Valid result, so we have sent out more copies of that work unit. 1) Why can't we go even further and only seend a workunit to just 2 computers, seeking a third result in the event of a discrepancy? The chances of 2 workunits sent to randomly selected computers being in error or tampered with in exactly the same way must be very slim. Good thinking. We will want to get a lot of statistics on errors before we broach this possibility. 'I tell you three times' is a very safe rule but might not be statistically justifiable. Lawrence |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
1) Why can't we go even further and only seend a workunit to just 2 computers, seeking a third result in the event of a discrepancy? The chances of 2 workunits sent to randomly selected computers being in error or tampered with in exactly the same way must be very slim. I think I recall reading somewhere that the quorum of 3 is a contractual obligation to the project scientists. They want 3 matching results so WCG has to provide that. [Edit 1 times, last edit by Former Member at May 27, 2006 3:18:15 AM] |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
On point Dirk Gently's point 2 query and Lawrence reponse "...plenty readon not...", observation is that if left in memory and computer crashes or is booted, the WU very likely gets corrupted and is closed. Zero score. When not left in memory, on restart it picks up from the last datapoint.....you loose several, seen up to 7 minutes of crunch time....anyone should be able to live with that. Anyway on all the validiation argumentation, i've been receiving a few that was of this new 4th copy submission type, with 2 valids on them and the 3rd not having send in anything or an error: If i see these, i freeze the one in progress, but only if that not already had 2 in PV. The 4th copy WU is pulled forward in the queu awaiting its turn, then the 0 or 1once are released again.
----------------------------------------I speculate, that the shorter you set the contact time, BUT NOT LESS THAN THE ORIGINAL DEFAULT OF 0,1 days, you will more likely get a 4th unit send, since if deemed reliable, the result would be returned quickest. ONLY, when you plan to go away for a longer time and take your workhorse offline, like i do, do you set the contact parameter to however long the away is going to be. That said, if the 4th copy is send out after 4 days of silence, its pretty pointless to store more than say 5 days worth, since the probability of any older copies already having a quorum of 3 is great, the 4th becoming a academic waste (lest you're just out for the points). I'd say 99,9% of he crunchers are in an area where ISP is up 99,9% of time and the powergrid is reliable by a similar percent. So, knowing a few that like to store a weeks worth, valid reasons excluded, they're doing the cause no favour.....they're just point hungry and like to stand out in their team once a week and force WCG to send out the 4th copy)! Should this 3 week validity then not be reconsidered? NO, for those that have slow machines or have WU's that do really take long time, it would not be fair.......does the distribution system consider this i.e. even fast machines taking a week and not sending out a 4th/5th/6th copy (I have one that now got 8, including 2 errors).....i suppose one could tweak it out or accept it as some redundancy. Saturday morning A-life and 1 Doppio Espresso down...enough WCG for the day. Have a Nice extension of the long weekend ![]()
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
|
Dirk Gently
Senior Cruncher England Joined: Mar 1, 2005 Post Count: 153 Status: Offline Project Badges:
|
Thanks to everyone for the replies.
----------------------------------------Its good to know that (2) is fixed. I already have BOINC 5.4.9 installed, so I will try out the other setting. It is not really a consequence on my desktop machine, because I seldom really need to pause it. But it may be usefull on laptop, which has less memory. With the old BOINC and the "Safe" setting of "leave in Memory when preempted", I still got a WU error sometimes after stopping/starting BOINC. This has not happened yet with new version, but it is early days. I try to avoid reboots anyway, and nervously check BOINC to see that it is not nearing the completion of a large WU ! OK, I should have known the answer to (3) already! At risk of being thick again, I ask about the time period of the results history shown in "Device Manager" "Results Status". This seems to go back 1 month. Is it possible to go back further in this level of detail? As far as I know, I have never had an "Invalid" or "inconclusive" result, but it would be nice to check. No one responded to my very last point - probably because I used the word "Rank", and we already have a rank system. What I meant was a kind of "Quality Index", based on quality of WUs returned in addition to the other purely quantative stats. This would be something like:- 100 x (no. of good WUs - no. of bad WUs) / (Total WUs) % This could be performed as a rolling figure based on say the last 3 months of data. This would add quality to the criteria for achievement, and discourage some of the more reckless kinds of overclocking! |
||
|
|
|