Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 171
|
![]() |
Author |
|
Richard Haselgrove
Senior Cruncher United Kingdom Joined: Feb 19, 2021 Post Count: 360 Status: Offline Project Badges: ![]() ![]() |
I think those two cases must have different causes.
All BOINC projects use different validators (they have to know what values to look for, related to their own science), and in general they keep the exact process pretty private. But I think that they all follow the same basic procedure. 1) Check that the output file(s) have been uploaded properly, and have the right sort of 'shape' - a reasonable size, expected format elements, etc. 2) Compare the actual numerical results with another computer running the same workunit. We know that WCG does use comparison checking - otherwise why would they deliberately send all replications to 'similar' computers (same OS, same device class)? But although all iGPUs should be the same, in practice their accuracy varies. I think that's what has tripped up your iGPU example But the NVidia example, with every replication invalid, seems to have tripped over the first part of the test: something about the data has failed the 'sensible structure' check on the returned files. |
||
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2179 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Another mix of Valids and Invalids on the same device, several samples. Again: Invalids are usually unseen on this device. Mine is the blue marked text.
----------------------------------------workunit 799806294: OPNG_0082346_00063_3-- Linux Ubuntu 728 Server Aborted 9/4/21 13:22:15 9/5/21 05:01:34 0.00 0.0 / 0.0 workunit 797034683:OPNG_0081490_00056_3-- Linux Ubuntu 728 Server Aborted 9/2/21 13:57:04 9/2/21 20:32:07 0.00 0.0 / 0.0 workunit 796199435:OPNG_0081168_00029_3-- ManjaroLinux 728 Valid 8/31/21 21:38:25 8/31/21 22:28:55 0.21 2.1 / 619.8 workunit 796184219:OPNG_0081120_00040_4-- Linux Debian 728 Valid 8/31/21 14:52:02 9/1/21 00:05:45 0.08 1.2 / 871.5 Too bad that one task is marked Too Late (too late to validate), because OPNG_0081120_00040_3-- is returned in time. Luckily it's not mine this time. [Edit 3 times, last edit by adriverhoef at Sep 5, 2021 2:38:06 PM] |
||
|
spRocket
Senior Cruncher Joined: Mar 25, 2020 Post Count: 277 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() |
Errors in OPNG units have generally been rare for me, but once in a while there's been a spate of malformed ones. I haven't seen any recently, in any case; I can go to the WCG stats page and select for "Error" without any OPNG units showing.
----------------------------------------The other day, though, I got a few resends with 1½-day deadlines, all of which came up valid for me on a GTX 960. [Edit 1 times, last edit by spRocket at Sep 1, 2021 11:47:00 AM] |
||
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2179 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
This time one of my devices with an Intel GPU returned a task that was declared Invalid, while only one wingman was enough for a Valid result.
----------------------------------------workunit 798259029: OPNG_0081873_00102_2-- Linux Ubuntu 728 Valid 9/2/21 20:56:51 9/3/21 02:16:55 0.59 1.9 / 1,000.3 EDIT: Two days later, another Invalid on that same device, while also producing about one Valid result per hour.workunit 800032096: OPNG_0082420_00056_4-- Linux Ubuntu 728 Server Aborted 9/4/21 19:37:05 9/5/21 11:35:01 0.00 0.0 / 0.0 'Invalid' is not the same as an Error status. With an Invalid, your task might still get counted towards Credits, Time and Results Returned; however, with an Error result, your task doesn't count towards Credits, Time nor Results Returned. [Edit 1 times, last edit by adriverhoef at Sep 5, 2021 1:42:11 PM] |
||
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2179 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
It was time for another case of all Invalids, I guess …
![]() workunit 803021600: OPNG_0083295_00022_4-- Linux Fedora 728 Server Aborted 9/7/21 15:03:10 9/7/21 15:27:34 0.00 0.0 / 0.0 |
||
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2179 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Another one bites the dust … (Intel GPU)
----------------------------------------workunit 811943579: OPNG_0085681_00077_4-- Linux Ubuntu 728 Server Aborted 9/16/21 11:54:12 9/16/21 12:17:05 0.00 0.0 / 0.0 And another one gone …workunit 808736232: OPNG_0084827_00243_4-- Linux Ubuntu 728 Server Aborted 9/12/21 00:39:42 9/13/21 10:46:06 0.00 0.0 / 0.0 [Edit 2 times, last edit by adriverhoef at Sep 16, 2021 6:55:54 PM] |
||
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2179 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
More all Invalid workunits (NVIDIA):
----------------------------------------workunit 813538160: OPNG_0085854_00229_4-- Linux Debian 728 Invalid 9/16/21 22:49:59 9/16/21 22:56:01 0.07 0.7 / 0.0 workunit 814164903:OPNG_0085889_00528_3-- Linux Gentoo 728 Too Late 9/16/21 21:17:23 9/16/21 21:21:36 0.05 0.7 / 0.0 workunit 813741495:OPNG_0085828_00048_4-- Linux Fedora 728 Invalid 9/16/21 11:09:06 9/16/21 11:13:20 0.06 0.6 / 0.0 workunit 813181825:OPNG_0085694_00446_3-- Linuxmint 728 Invalid 9/16/21 02:55:29 9/16/21 03:00:46 0.07 0.6 / 0.0 workunit 813318525:OPNG_0085785_00315_3-- Linux Ubuntu 728 Invalid 9/16/21 01:34:18 9/16/21 01:43:35 0.15 0.7 / 0.0 workunit 813129825:OPNG_0085719_00349_3-- Linux Ubuntu 728 Server Aborted 9/16/21 00:14:26 9/16/21 00:19:22 0.00 0.0 / 0.0 [Edit 3 times, last edit by adriverhoef at Sep 17, 2021 12:17:18 AM] |
||
|
Richard Haselgrove
Senior Cruncher United Kingdom Joined: Feb 19, 2021 Post Count: 360 Status: Offline Project Badges: ![]() ![]() |
I have 11 "four invalids and an abort" - 10 on NVidia (both Windows and Linux) and one iGPU. I think we have to conclude that the datasets resulted in an "unexpected item in the result file", rather than a systemic fault in the volunteer community.
|
||
|
Grumpy Swede
Master Cruncher Svíþjóð Joined: Apr 10, 2020 Post Count: 2225 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Yup, me too. 3 invalids, and one abort so far.
|
||
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 988 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I have 11 "four invalids and an abort" - 10 on NVidia (both Windows and Linux) and one iGPU. I think we have to conclude that the datasets resulted in an "unexpected item in the result file", rather than a systemic fault in the volunteer community. Yup - this is the second time that a cluster of work units associated with the receptor 7jji_001--ALYS417_inert_rigid has thrown up lots of Invalids. There were a load of tasks with numbers in the 0045xxx and low 0046xxx sequences, from around the 2nd June 2021 for about three or four days. In both cases, the majority of work units seem to end up with most tasks Invalid, perhaps one Too Late(!) and one or two Server Aborted. However, sometimes there is a Valid result (or even two on very odd occasions!), so it looks as if whatever is causing the "unexpected item(s)" may be to do with some odd edge case that gets processed in different ways on different GPUs. (I had a few where mine was Valid in the June set, but so far none in the September set.) I presume/hope that these find their way back to the scientists in some form, and they can decide what to do about them. As has been pointed out elsewhere a result that comes back Invalid may actually reflect valid science! Cheers - Al. P.S. I have task names and outcomes for all the examples I've seen, should they be needed... |
||
|
|
![]() |