Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 37
Posts: 37   Pages: 4   [ Previous Page | 1 2 3 4 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 3941 times and has 36 replies Next Thread
My 2 cents worth
Cruncher
Joined: Sep 12, 2008
Post Count: 19
Status: Offline
Reply to this Post  Reply with Quote 
Re: Invalid results in new version

Actual stats for HCMD2 using version 6.13:

Platform        Success Error   Valid   Invalid Pending Inconclusive    % Potential Invalid
Darwin/Intel 7082 15 5271 0 1811 0 0%
Linux/AMD 4655 22 3321 41 918 375 6.1%
Linux/Intel 13993 102 11278 276 1920 519 4.4%
Windows/AMD 19803 433 14671 13 5088 31 0.2%
Windows/Intel 88729 1376 64183 38 24453 55 0.1%

% potential invalid = 1/2 inconclusive + invalid / (valid + invalid + inconclusive)

A number of the higher invalid rates on Linux are due to 1 machine (for Linux AMD) and 1 user (for Linux Intel) that are returning garbage results and because it appears that results returned with 611 do not match 613 (likely due to floating point differences after changing the compiler options). Due to the much higher than normal limits for # workunits, it is taking longer than normal to limit the number of workunits the troublesome machines get.

Nice Apple Wins Again biggrin
----------------------------------------


[Jun 1, 2009 2:19:09 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Invalid results in new version

Another example:

CMD2_ 0001-GPDAA.clustersOccur-MYH6.clustersOccur_ 1614_ 2-- 613 Invalid 5/26/09 21:04:29 5/27/09 03:46:57 0.83 11.8 / 5.9
CMD2_ 0001-GPDAA.clustersOccur-MYH6.clustersOccur_ 1614_ 1-- 611 Valid 5/12/09 17:22:07 5/13/09 07:53:46 0.86 12.7 / 12.2 <===Mine
CMD2_ 0001-GPDAA.clustersOccur-MYH6.clustersOccur_ 1614_ 0-- 611 Valid 5/12/09 17:18:22 5/28/09 18:34:06 0.92 11.8 / 12.2

We understand this one. Strictly the bottom result was way too late, but as it was, it caused the 2 6.11 tasks to match. Such is randomness. Fortunately the servers mark open tasks when quorum 2 is complete as redundant. Then there is that moment when the client talks to the server and the spare task gets automatically canceled, if it has not started yet, so your 44 potential is a worst case.

Afterthought would have been to cancel all open 6.11 tasks and reissue with 6.13, but that's too late. Originally it was thought 6.11 and 6.13 results were compatible. Personally, have not had a single case, all validating. This probably a conflict of small buffer meets bigger buffer, low contact frequency.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Jun 1, 2009 7:59:40 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Invalid results in new version

Afterthought would have been to cancel all open 6.11 tasks and reissue with 6.13, but that's too late. Originally it was thought 6.11 and 6.13 results were compatible.

Even though that testing wasn't done, if the results from 611 and 613 are both usable, the validator routine should recognise that the results are within a valid range and mark them as valid!

I have quite a number of these coming through now. Not as many as I expected, as some of the machines that haven't replied are suddenly producing results a day or two before deadline. And some aren't.
e.g.
CMD2_ 0001-DHRS3.clustersOccur-MYH3.clustersOccur_ 1369_ 3-- 613 Valid 30/05/09 23:08:21 31/05/09 17:51:34 2.85 16.6 / 18.3
CMD2_ 0001-DHRS3.clustersOccur-MYH3.clustersOccur_ 1369_ 2-- 613 Valid 30/05/09 07:59:00 30/05/09 21:55:58 3.47 20.1 / 18.3
CMD2_ 0001-DHRS3.clustersOccur-MYH3.clustersOccur_ 1369_ 0-- 611 Invalid 16/05/09 05:45:30 16/05/09 15:16:08 1.98 12.3 / 6.2 <- mine
CMD2_ 0001-DHRS3.clustersOccur-MYH3.clustersOccur_ 1369_ 1-- 611 Aborted 16/05/09 05:45:30 30/05/09 06:36:42 0.00 0.0 / 0.0
[Jun 1, 2009 2:48:52 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Invalid results in new version

Should, could, would and how much programming needed and what tolerance?... well, which of the 2 in a quorum would then be the one to take to the assimilator into the master database? The premise is quorum 2, bit for bit comparison and agreement. If there is some fall-out in the transition, so be it, but that's my view.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Jun 1, 2009 3:04:29 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Invalid results in new version

We'll see. If the large number of invalid results in 613 end up being due to, say, rounding differences in floating point units, then having see leeway would fix both sources of invalid results.
[Jun 1, 2009 3:38:12 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Invalid results in new version

Hi Kremmen,
The complicated validator that accepts results that are close together but not exactly the same is the one being used by HPF2. So far, the project scientists are not beating the door down asking for similar validators to be written for their projects. There may be reasons not to go this way, such as server overload. But if it does make sense, I expect that over the years we shall move that way.

Lawrence
[Jun 1, 2009 5:29:14 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Invalid results in new version

Seemingly not all are failing in a 6.11 - 6.13 combo. Just had 3 rush jobs validating normally:

CMD2_ 0002-MYH6.clustersOccur-MYH6.clustersOccur_ 9608_ 2-- 613 Valid 1-6-09 14:31:57 1-6-09 17:30:21 2.03 33.3 / 33.8 < mine
CMD2_ 0002-MYH6.clustersOccur-MYH6.clustersOccur_ 9608_ 0-- 613 Error 29-5-09 18:57:09 1-6-09 13:10:09 0.00 0.0 / 0.0
CMD2_ 0002-MYH6.clustersOccur-MYH6.clustersOccur_ 9608_ 1-- 611 Valid 22-5-09 07:59:25 22-5-09 22:20:32 1.91 34.3 / 33.8
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Jun 1, 2009 5:55:47 PM]   Link   Report threatening or abusive post: please login first  Go to top 
JmBoullier
Former Community Advisor
Normandy - France
Joined: Jan 26, 2007
Post Count: 3715
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Invalid results in new version

Sek,
The most common reason for the 6.11-6.13 invalids is a different compiler or compiling options (or both maybe) under Linux to avoid the problem which affected P3 machines.
Your chances to experience one under Windows are extremely low... wink

Cheers. Jean.
----------------------------------------
Team--> Decrypthon -->Statistics/Join -->Thread
[Jun 1, 2009 6:46:55 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Invalid results in new version

Ergo, much ado about a molehill under a microsope ;>)
Actual stats for HCMD2 using version 6.13:

Platform        Success Error   Valid   Invalid Pending Inconclusive    % Potential Invalid
Darwin/Intel 7082 15 5271 0 1811 0 0%
Linux/AMD 4655 22 3321 41 918 375 6.1%
Linux/Intel 13993 102 11278 276 1920 519 4.4%
Windows/AMD 19803 433 14671 13 5088 31 0.2%
Windows/Intel 88729 1376 64183 38 24453 55 0.1%

% potential invalid = 1/2 inconclusive + invalid / (valid + invalid + inconclusive)

A number of the higher invalid rates on Linux are due to 1 machine (for Linux AMD) and 1 user (for Linux Intel) that are returning garbage results and because it appears that results returned with 611 do not match 613 (likely due to floating point differences after changing the compiler options). Due to the much higher than normal limits for # workunits, it is taking longer than normal to limit the number of workunits the troublesome machines get.

----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Jun 1, 2009 6:53:17 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Invalid results in new version

Ergo, much ado about a molehill under a microsope ;>)

Yes but to me it is a mountain through a telescope as 9 of 12 cores are Linux only smile
We understand this one. Strictly the bottom result was way too late, but as it was, it caused the 2 6.11 tasks to match. Such is randomness. Fortunately the servers mark open tasks when quorum 2 is complete as redundant. Then there is that moment when the client talks to the server and the spare task gets automatically canceled, if it has not started yet, so your 44 potential is a worst case.

You are right about 44 as a worst case. I scrutinized the results a little closer and found 10 of those to be from a Windows machine and should validate okay. So far, it appears I have now 20 6.11 vs 6.13 invalids which is nearly 33% of the lot in question and around a day or two lost cpu time crying . I sure hope they use the same compiler/options when they bring out the newer version for the slow rotators progress indicators.
[Jun 1, 2009 10:26:11 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 37   Pages: 4   [ Previous Page | 1 2 3 4 | Next Page ]
[ Jump to Last Post ]
Post new Thread