Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Completed Research Forum: Help Cure Muscular Dystrophy - Phase 2 Forum Thread: Invalid results in new version |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 37
|
Author |
|
My 2 cents worth
Cruncher Joined: Sep 12, 2008 Post Count: 19 Status: Offline |
Actual stats for HCMD2 using version 6.13: Platform Success Error Valid Invalid Pending Inconclusive % Potential Invalid % potential invalid = 1/2 inconclusive + invalid / (valid + invalid + inconclusive) A number of the higher invalid rates on Linux are due to 1 machine (for Linux AMD) and 1 user (for Linux Intel) that are returning garbage results and because it appears that results returned with 611 do not match 613 (likely due to floating point differences after changing the compiler options). Due to the much higher than normal limits for # workunits, it is taking longer than normal to limit the number of workunits the troublesome machines get. Nice Apple Wins Again |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Another example: CMD2_ 0001-GPDAA.clustersOccur-MYH6.clustersOccur_ 1614_ 2-- 613 Invalid 5/26/09 21:04:29 5/27/09 03:46:57 0.83 11.8 / 5.9 CMD2_ 0001-GPDAA.clustersOccur-MYH6.clustersOccur_ 1614_ 1-- 611 Valid 5/12/09 17:22:07 5/13/09 07:53:46 0.86 12.7 / 12.2 <===Mine CMD2_ 0001-GPDAA.clustersOccur-MYH6.clustersOccur_ 1614_ 0-- 611 Valid 5/12/09 17:18:22 5/28/09 18:34:06 0.92 11.8 / 12.2 We understand this one. Strictly the bottom result was way too late, but as it was, it caused the 2 6.11 tasks to match. Such is randomness. Fortunately the servers mark open tasks when quorum 2 is complete as redundant. Then there is that moment when the client talks to the server and the spare task gets automatically canceled, if it has not started yet, so your 44 potential is a worst case. Afterthought would have been to cancel all open 6.11 tasks and reissue with 6.13, but that's too late. Originally it was thought 6.11 and 6.13 results were compatible. Personally, have not had a single case, all validating. This probably a conflict of small buffer meets bigger buffer, low contact frequency.
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All! |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Afterthought would have been to cancel all open 6.11 tasks and reissue with 6.13, but that's too late. Originally it was thought 6.11 and 6.13 results were compatible. Even though that testing wasn't done, if the results from 611 and 613 are both usable, the validator routine should recognise that the results are within a valid range and mark them as valid! I have quite a number of these coming through now. Not as many as I expected, as some of the machines that haven't replied are suddenly producing results a day or two before deadline. And some aren't. e.g. CMD2_ 0001-DHRS3.clustersOccur-MYH3.clustersOccur_ 1369_ 3-- 613 Valid 30/05/09 23:08:21 31/05/09 17:51:34 2.85 16.6 / 18.3 CMD2_ 0001-DHRS3.clustersOccur-MYH3.clustersOccur_ 1369_ 2-- 613 Valid 30/05/09 07:59:00 30/05/09 21:55:58 3.47 20.1 / 18.3 CMD2_ 0001-DHRS3.clustersOccur-MYH3.clustersOccur_ 1369_ 0-- 611 Invalid 16/05/09 05:45:30 16/05/09 15:16:08 1.98 12.3 / 6.2 <- mine CMD2_ 0001-DHRS3.clustersOccur-MYH3.clustersOccur_ 1369_ 1-- 611 Aborted 16/05/09 05:45:30 30/05/09 06:36:42 0.00 0.0 / 0.0 |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Should, could, would and how much programming needed and what tolerance?... well, which of the 2 in a quorum would then be the one to take to the assimilator into the master database? The premise is quorum 2, bit for bit comparison and agreement. If there is some fall-out in the transition, so be it, but that's my view.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All! |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
We'll see. If the large number of invalid results in 613 end up being due to, say, rounding differences in floating point units, then having see leeway would fix both sources of invalid results.
|
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi Kremmen,
The complicated validator that accepts results that are close together but not exactly the same is the one being used by HPF2. So far, the project scientists are not beating the door down asking for similar validators to be written for their projects. There may be reasons not to go this way, such as server overload. But if it does make sense, I expect that over the years we shall move that way. Lawrence |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Seemingly not all are failing in a 6.11 - 6.13 combo. Just had 3 rush jobs validating normally:
----------------------------------------CMD2_ 0002-MYH6.clustersOccur-MYH6.clustersOccur_ 9608_ 2-- 613 Valid 1-6-09 14:31:57 1-6-09 17:30:21 2.03 33.3 / 33.8 < mine CMD2_ 0002-MYH6.clustersOccur-MYH6.clustersOccur_ 9608_ 0-- 613 Error 29-5-09 18:57:09 1-6-09 13:10:09 0.00 0.0 / 0.0 CMD2_ 0002-MYH6.clustersOccur-MYH6.clustersOccur_ 9608_ 1-- 611 Valid 22-5-09 07:59:25 22-5-09 22:20:32 1.91 34.3 / 33.8
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All! |
||
|
JmBoullier
Former Community Advisor Normandy - France Joined: Jan 26, 2007 Post Count: 3715 Status: Offline Project Badges: |
Sek,
----------------------------------------The most common reason for the 6.11-6.13 invalids is a different compiler or compiling options (or both maybe) under Linux to avoid the problem which affected P3 machines. Your chances to experience one under Windows are extremely low... Cheers. Jean. |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Ergo, much ado about a molehill under a microsope ;>)
----------------------------------------Actual stats for HCMD2 using version 6.13: Platform Success Error Valid Invalid Pending Inconclusive % Potential Invalid % potential invalid = 1/2 inconclusive + invalid / (valid + invalid + inconclusive) A number of the higher invalid rates on Linux are due to 1 machine (for Linux AMD) and 1 user (for Linux Intel) that are returning garbage results and because it appears that results returned with 611 do not match 613 (likely due to floating point differences after changing the compiler options). Due to the much higher than normal limits for # workunits, it is taking longer than normal to limit the number of workunits the troublesome machines get.
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All! |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Ergo, much ado about a molehill under a microsope ;>) Yes but to me it is a mountain through a telescope as 9 of 12 cores are Linux only We understand this one. Strictly the bottom result was way too late, but as it was, it caused the 2 6.11 tasks to match. Such is randomness. Fortunately the servers mark open tasks when quorum 2 is complete as redundant. Then there is that moment when the client talks to the server and the spare task gets automatically canceled, if it has not started yet, so your 44 potential is a worst case. You are right about 44 as a worst case. I scrutinized the results a little closer and found 10 of those to be from a Windows machine and should validate okay. So far, it appears I have now 20 6.11 vs 6.13 invalids which is nearly 33% of the lot in question and around a day or two lost cpu time . I sure hope they use the same compiler/options when they bring out the newer version for the slow rotators progress indicators. |
||
|
|