Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Completed Research Forum: AfricanClimate@Home Thread: Invalid results |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 23
|
Author |
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
The issue is possibly that a restart writes a few extra 'resume' bits. The validation process looks e.g. at a hash number to ensure all copies, each representing 1/10, make up the total as if a single result was received. If the hash deviates from the majority, that copy is resubmitted until it matches. If too many of the original set deviated, the whole set is resubmitted for 2nd opinion (looking at the or and times of a larger distribution set).
----------------------------------------It's not that simple as even clean runs, per the result log, have turned into 'invalid' on a few done on my P4HT. As running in service, never got to see the graphics, so the observation report of the line graph certainly interesting. Maybe knreed or the other techs could look at the invalids and see if the none critical bits are causing that anomaly? cheers
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All! |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
With so many Invalid results occurring with ACH why don't the techs increase the number of initial replications from 10?
At least, as a temporary workaround for faster turnaround. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hello bijoalex,
I am sure the techs will do whatever they think is needed. But I have recently heard that the grad students will be working on their doctoral dissertations for this project after we return cycle 27 - the final 2 week period of the year. So I expect that AfricanClimate@Home will go on hiatus for an extended period until the next set of grad students take up the research project and refine the program in light of the initial results. So the time factor is probably not very important. Lawrence |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
That's an interesting insight. I thought the time factor is important as one cycle of ACH WUs can be sent out only after the previous ones are back after processing.
More over, if the grad students are waiting for the results to start their doctoral dissertations, wouldn't they want the results to be available as early as possible? |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Maybe knreed or the other techs could look at the invalids and see if the none critical bits are causing that anomaly? Seems that something changed in the validation process. My above mentioned WU ach1_23_48_9-- has got a valid result now. It was on "inconclusive" after the first 10 results came back (+2 for errors) and another set of 10 WUs were sent out. At the moment we are at 27 (!!!) replications but with astonishing 23 valid results (2 errors, 2 still in progress). Greetings Thorsten |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
For what it's worth, I had one host BSoD last night -- it was crunching something else at the time but an ACH wu was waiting for its turn. After the post-crash cleanup I resumed BOINC and could soon see the bump in the graph. Figured the result would be screwed up, but it did validate. I have an earlier result from the same box that is still inconclusive, and has a bump due to a software-installation reboot.
----------------------------------------So while resuming from checkpoint might frequently cause a validation problem, it appears not to do so 100% of the time. I guess that fits with some currently speculated causes. [Edit: XP Pro, CC 5.10.30] I'm curious about the "bump" -- it certainly does seem to to correlate with work unit restarts. Why might this happen? [Edit 1 times, last edit by Former Member at Mar 4, 2008 7:27:58 PM] |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
The image i have is that if first 10 without raw error are back, a test is applied and e.g. if 8 agree on the control hash, 2 will be send out for recomputation. Eventually if too many do not agree i.e the set is inconclusive, the whole set is resubmitted in one go. Indeed after return of those 10 it might come out that in fact the original majority was valid all along.
----------------------------------------Anyway, the admin advised that the next phase will be done differently. The world of bandwidth may look upgraded again by end of this year or 2009 so the whole split might go different, similar to CPDN where e.g. large units are send out and trickles are returned.... nothing has developed how to approach the validation and distribution methodology for the next step.
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All! |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Seems that something changed in the validation process. My above mentioned WU ach1_23_48_9-- has got a valid result now. It was on "inconclusive" after the first 10 results came back (+2 for errors) and another set of 10 WUs were sent out. At the moment we are at 27 (!!!) replications but with astonishing 23 valid results (2 errors, 2 still in progress). Maybe this is one of the effects of the new BOINC server code and/or misconfigured validators (see knreed's post here) Especially this: We apologize for this bug and we are installing corrected validators now. Cheers Thorsten |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Not really. Seen this a number of times in past where in the end a majority gets validated. As said the AC@H future is going to see a different way of distribution.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All! |
||
|
retsof
Former Community Advisor USA Joined: Jul 31, 2005 Post Count: 6824 Status: Offline Project Badges: |
I hope this helps to find the reason for such behaviour. But for now I would recommend to avoid any restarts of ACH workunits. Otherwise there is a high risk to produce an invalid result (at least on my machine). I have had only one invalid, but I might have turned off the machine. I have tried to be careful to check for an AC@H after that and just let it run. Another thread mentions spontaneous restarts.
SUPPORT ADVISOR
Work+GPU i7 8700 12threads School i7 4770 8threads Default+GPU Ryzen 7 3700X 16threads Ryzen 7 3800X 16 threads Ryzen 9 3900X 24threads Home i7 3540M 4threads50% |
||
|
|