Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 37
Posts: 37   Pages: 4   [ Previous Page | 1 2 3 4 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 3917 times and has 36 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Invalid results in new version

The complicated validator that accepts results that are close together but not exactly the same is the one being used by HPF2.

I didn't realise there was only one project doing that.

My Intel machines are heading towards 100% invalid on HCMD2, so I just hope something gets done about the problem soon, while some of my machines are taking a break from HCMD2 and doing some nice, reliable Beta units (now, there's a phrase you don't see every day! smile ) instead.
[Jun 3, 2009 4:01:46 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Invalid results in new version

Are you continuing to see issues with HCMD2 in production?

In a very big way.

Over the last week, breakdown by CPU type (excluding all the 611 vs 613 invalid entries) is roughly:
Athlon: 100% valid
P4: 90% valid
P3: 0% valid
Celeron: 0% valid

The P4 was 100% valid running (many) 611 WUs. I wonder what is happening here? There was no obvious pattern to the pile of invalid results during the beta. There, they appeared to be spread evenly across all machine types, though the sample size is too small to be sure. Maybe there are particular types of WUs which cause the issues, given that the first 40 HCMD2 betas were all perfect and then lots of them (about 50%) were invalid and then the last set were about 10% invalid, which ended up being the average.
[Jun 4, 2009 3:01:03 AM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Invalid results in new version

Kremmem,

Your doubts have proven well founded. We continue to experience about a 5% invalid rate on Linux despite mismatches between 6.13 and 6.11 ending (due to the old workunits mostly being finished off).

We are working on figuring out exactly what is going on, here is what we suspect at this time:

1) Old processors Pentium II and older and some Pentium III's will never match with newest processors
2) Medium old processors (still haven't figured this part out yet) will sometimes match other processors and sometimes will not. It appears to vary based on certain positions

We are looking at breaking things into additional HR classes for Linux. I uploaded code that handle case #1 by placing them into their own hr class separate from the other Intel processors late yesterday. Results assigned after around 4:00 UTC today should validate when they are returned. Please continue to inform us of the progress.

We are also looking to see if we can use an alternate math library (we are using the Intel compiler and therefore the Intel math library currently) that will not experience these issues.

As we figure out what exactly is going on, we will let everyone know.

thanks,
Kevin
----------------------------------------
[Edit 1 times, last edit by knreed at Jun 4, 2009 1:50:25 PM]
[Jun 4, 2009 1:30:58 PM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Invalid results in new version

We continue to look at an alternate math library for a longer term solution.

However, we have been able to develop a set of rules to implement a fuzzy validator that allows us distinguish between two valid results from different processor families but still identify incorrect results. We have just put this into production. This should eliminate the majority of the inconclusive results. We will continue to monitor.

Kremmem - we re-ran your invalid results for those results that had not already been deleted to verify our fix in production. You now have several marked valid on your older computers. Thank you for continuing to insist that there was a problem.
[Jun 4, 2009 9:07:37 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Invalid results in new version

After your initial fix, everything was fine. Now, some more WUs are getting marked as invalid.

e.g.
CMD2_ 0006-1S2Q_ B.clustersOccur-2Z5X_ A.clustersOccur_ 37_ 81812_ 82155_ 2-- 613 Valid 6/6/09 02:52:40 6/6/09 05:56:21 1.80 34.6 / 34.6
CMD2_ 0006-1S2Q_ B.clustersOccur-2Z5X_ A.clustersOccur_ 37_ 81812_ 82155_ 0-- 613 Valid 6/5/09 19:05:35 6/5/09 21:54:19 1.19 24.4 / 34.6
CMD2_ 0006-1S2Q_ B.clustersOccur-2Z5X_ A.clustersOccur_ 37_ 81812_ 82155_ 1-- 613 Invalid 6/5/09 19:00:40 6/6/09 01:47:48 4.01 31.6 / 10.2

CMD2_ 0003-1I7X_ B.clustersOccur-2V5Y_ A.clustersOccur_ 58_ 755688_ 757618_ 2-- 613 Valid 6/5/09 12:42:34 6/5/09 23:49:13 4.17 74.9 / 108.5
CMD2_ 0003-1I7X_ B.clustersOccur-2V5Y_ A.clustersOccur_ 58_ 755688_ 757618_ 1-- 613 Valid 6/5/09 04:22:46 6/5/09 12:38:24 4.31 123.3 / 94.2
CMD2_ 0003-1I7X_ B.clustersOccur-2V5Y_ A.clustersOccur_ 58_ 755688_ 757618_ 0-- 613 Invalid 6/5/09 04:20:58 6/5/09 09:33:01 4.28 23.2 / 11.6
[Jun 6, 2009 10:04:21 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Invalid results in new version

After your initial fix, everything was fine. Now, some more WUs are getting marked as invalid.

After the application of the new fuzzy validator logic you should be seeing more valids than you had before the change... 100% was not promised.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Jun 6, 2009 10:37:07 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Invalid results in new version

Kevin,

Since your fixes, I've still seen a few invalid results in 6.13 and in 6.14 beta. Every WU that completed in less than 4 hours has been marked valid. Every invalid that I've seen has been on WUs that exceeded the 4 hours max runtime.
e.g.
BETA_ CMD2_ 0006-2A5AA.clustersOccur-LSD1A.clustersOccur_ 274_ 2-- 614 Valid 10/06/09 11:08:48 10/06/09 15:38:11 4.39 69.1 / 55.5
BETA_ CMD2_ 0006-2A5AA.clustersOccur-LSD1A.clustersOccur_ 274_ 1-- 614 Valid 10/06/09 00:07:26 10/06/09 11:08:31 4.00 23.5 / 31.1
BETA_ CMD2_ 0006-2A5AA.clustersOccur-LSD1A.clustersOccur_ 274_ 0-- 614 Invalid 10/06/09 00:06:56 10/06/09 05:44:55 4.01 21.5 / 8.1
[Jun 11, 2009 8:12:34 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Invalid results in new version

Did you read the Beta Forum announcements on possible further splitting of homogeneous redundancy grouping need? There's a thread there for testing 6.14 where your post is probably more in place. Version 6.13 has already been replaced by 6.14 in production based on the initial good results.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Jun 11, 2009 8:23:34 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Invalid results in new version

Did you read the Beta Forum announcements on possible further splitting of homogeneous redundancy grouping need?

If particular machines get 100% valid results on <4 hour WUs and some greater proportion of invalid results on 4 hour WUs, that strikes me as unlikely to be a grouping issue. (Though it's possible that grouping like machines could fix it anyhow, as similar types would result in more similar speeds, which in turn would result in similar amounts of work done in a given time.)
[Jun 11, 2009 10:00:16 AM]   Link   Report threatening or abusive post: please login first  Go to top 
mclaver
Veteran Cruncher
Joined: Dec 19, 2005
Post Count: 566
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Invalid results in new version

Kremmem,

Your doubts have proven well founded. We continue to experience about a 5% invalid rate on Linux despite mismatches between 6.13 and 6.11 ending (due to the old workunits mostly being finished off).

We are working on figuring out exactly what is going on, here is what we suspect at this time:

1) Old processors Pentium II and older and some Pentium III's will never match with newest processors
2) Medium old processors (still haven't figured this part out yet) will sometimes match other processors and sometimes will not. It appears to vary based on certain positions

We are looking at breaking things into additional HR classes for Linux. I uploaded code that handle case #1 by placing them into their own hr class separate from the other Intel processors late yesterday. Results assigned after around 4:00 UTC today should validate when they are returned. Please continue to inform us of the progress.

We are also looking to see if we can use an alternate math library (we are using the Intel compiler and therefore the Intel math library currently) that will not experience these issues.

As we figure out what exactly is going on, we will let everyone know.

thanks,
Kevin


Just so you know I had 100% success rate during Beta for an I7 920 running Ubuntu 9.04.

Result Name Device Name Status Sent Time Time Due /
Return Time CPU Time (hours) Claimed/ Granted BOINC Credit
BETA_ CMD2_ 0006-2A5AA.clustersOccur-ZPR1.clustersOccur_ 1_ 315_ 347_ 0-- MSI-I7-920 Valid 6/10/09 23:28:22 6/11/09 00:44:40 1.10 18.0 / 14.2
BETA_ CMD2_ 0006-2A5AA.clustersOccur-VINCA.clustersOccur_ 36_ 55247_ 55382_ 1-- MSI-I7-920 Valid 6/10/09 23:15:17 6/11/09 00:37:33 0.94 15.5 / 14.0
BETA_ CMD2_ 0006-2A5AA.clustersOccur-PTENA.clustersOccur_ 24_ 112118_ 112883_ 0-- MSI-I7-920 Valid 6/10/09 22:29:35 6/10/09 23:39:09 1.06 17.4 / 26.1
BETA_ CMD2_ 0006-2A5AA.clustersOccur-PTENA.clustersOccur_ 22_ 102152_ 102954_ 1-- MSI-I7-920 Valid 6/10/09 22:29:17 6/10/09 23:54:02 1.24 20.5 / 21.8
BETA_ CMD2_ 0006-2A5AA.clustersOccur-PTENA.clustersOccur_ 19_ 88823_ 89871_ 1-- MSI-I7-920 Valid 6/10/09 22:28:55 6/10/09 23:58:24 1.36 22.4 / 17.9
BETA_ CMD2_ 0006-2A5AA.clustersOccur-PTENA.clustersOccur_ 17_ 80869_ 81827_ 0-- MSI-I7-920 Valid 6/10/09 22:28:37 6/11/09 00:01:55 1.43 23.6 / 22.2
BETA_ CMD2_ 0006-2A5AA.clustersOccur-PRDX6.clustersOccur_ 24_ 180464_ 181909_ 1-- MSI-I7-920 Valid 6/10/09 22:27:57 6/11/09 02:47:09 1.54 40.7 / 40.7
BETA_ CMD2_ 0006-2A5AA.clustersOccur-PRDX6.clustersOccur_ 26_ 196224_ 197344_ 1-- MSI-I7-920 Pending Validation 6/10/09 22:27:57 6/11/09 00:37:53 0.99 16.2 / 0.0
BETA_ CMD2_ 0006-2A5AA.clustersOccur-KCRMA.clustersOccur_ 48_ 192622_ 193241_ 0-- MSI-I7-920 Valid 6/10/09 20:48:52 6/10/09 22:24:20 0.96 14.7 / 29.4
BETA_ CMD2_ 0006-2A5AA.clustersOccur-EZRIA.clustersOccur_ 396_ 730661_ 730997_ 0-- MSI-I7-920 Valid 6/10/09 19:46:27 6/10/09 20:48:51 0.78 11.7 / 19.0
BETA_ CMD2_ 0006-2A5AA.clustersOccur-EZRIA.clustersOccur_ 358_ 661186_ 661636_ 1-- MSI-I7-920 Valid 6/10/09 19:43:26 6/10/09 22:24:20 1.21 18.5 / 21.3
BETA_ CMD2_ 0006-2A5AA.clustersOccur-EZRIA.clustersOccur_ 156_ 289006_ 289350_ 1-- MSI-I7-920 Valid 6/10/09 19:16:30 6/10/09 20:49:10 0.91 13.5 / 15.6
BETA_ CMD2_ 0006-2A5AA.clustersOccur-ARRB1A.clustersOccur_ 14_ 61269_ 62000_ 0-- MSI-I7-920 Valid 6/10/09 18:18:07 6/10/09 19:43:47 1.05 16.0 / 20.7
BETA_ CMD2_ 0006-2A5AA.clustersOccur-ACTS.clustersOccur_ 7_ 36619_ 37036_ 1-- MSI-I7-920 Valid 6/10/09 18:06:23 6/10/09 19:16:30 0.56 8.5 / 14.3
BETA_ CMD2_ 0006-2A5AA.clustersOccur-2A5EA.clustersOccur_ 86_ 299115_ 299577_ 0-- MSI-I7-920 Valid 6/10/09 17:34:28 6/10/09 19:43:25 0.80 12.2 / 11.1
BETA_ CMD2_ 0006-2A5AA.clustersOccur-VINCA.clustersOccur_ 55_ 0-- MSI-I7-920 Pending Validation 6/10/09 02:20:02 6/10/09 15:19:44 4.00 63.3 / 0.0
BETA_ CMD2_ 0006-2A5AA.clustersOccur-TELTA.clustersOccur_ 65_ 0-- MSI-I7-920 Valid 6/10/09 01:56:40 6/10/09 15:11:50 5.39 88.6 / 88.6
BETA_ CMD2_ 0006-2A5AA.clustersOccur-TELTA.clustersOccur_ 265_ 1-- MSI-I7-920 Valid 6/10/09 01:44:15 6/10/09 03:52:57 0.60 8.8 / 7.9
BETA_ CMD2_ 0006-2A5AA.clustersOccur-TELTA.clustersOccur_ 260_ 0-- MSI-I7-920 Valid 6/10/09 01:43:58 6/10/09 02:43:40 0.61 8.8 / 9.0
BETA_ CMD2_ 0006-2A5AA.clustersOccur-LSD1A.clustersOccur_ 330_ 2-- MSI-I7-920 Valid 6/10/09 00:17:45 6/10/09 05:23:12 5.03 73.3 / 71.9
BETA_ CMD2_ 0006-2A5AA.clustersOccur-LSD1A.clustersOccur_ 32_ 2-- MSI-I7-920 Valid 6/10/09 00:17:23 6/10/09 05:17:11 4.94 72.0 / 215.5
BETA_ CMD2_ 0006-2A5AA.clustersOccur-LSD1A.clustersOccur_ 63_ 0-- MSI-I7-920 Valid 6/10/09 00:13:51 6/10/09 13:55:41 5.18 94.4 / 100.5
BETA_ CMD2_ 0006-2A5AA.clustersOccur-EZRIA.clustersOccur_ 13_ 0-- MSI-I7-920 Valid 6/9/09 20:09:46 6/10/09 13:55:41 6.63 120.8 / 88.6
BETA_ CMD2_ 0006-2A5AA.clustersOccur-EZRIA.clustersOccur_ 124_ 1-- MSI-I7-920 Valid 6/9/09 20:09:28 6/10/09 03:52:33 4.06 59.2 / 59.2
BETA_ CMD2_ 0006-2A5AA.clustersOccur-DDX3XA.clustersOccur_ 38_ 1-- MSI-I7-920 Valid 6/9/09 19:49:05 6/10/09 05:17:11 4.87 71.0 / 63.8
BETA_ CMD2_ 0006-2A5AA.clustersOccur-2A5EA.clustersOccur_ 79_ 0-- MSI-I7-920 Valid 6/9/09 19:16:32 6/10/09 03:29:23 4.00 58.3 / 58.3
----------------------------------------



[Jun 11, 2009 12:36:23 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 37   Pages: 4   [ Previous Page | 1 2 3 4 | Next Page ]
[ Jump to Last Post ]
Post new Thread