Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 171
Posts: 171   Pages: 18   [ Previous Page | 1 2 3 4 5 6 7 8 9 10 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 729229 times and has 170 replies Next Thread
nanoprobe
Master Cruncher
Classified
Joined: Aug 29, 2008
Post Count: 2998
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Invalid GPU work units

I've set my hosts to download tasks but not run them. I then look through them for any _3 and _4 resends. If there are invalids in the first 3 that are sent out I just abort the the _3 and _4 tasks. It also gives a chance to see if any of the _0 or_1 tasks have already returned an invalid and the server has issued a resend. If the resends are due to errors then Ill let mine run. It's a lot of micro managing but I don't see much point in running a task again that already has numerous invalids. JMHO

Just got a 3436 _3 with 2 invalids. CYA!
And now a 3465 _2 with invalids
----------------------------------------
In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.


----------------------------------------
[Edit 4 times, last edit by nanoprobe at Apr 16, 2021 12:16:57 AM]
[Apr 16, 2021 12:03:07 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Grumpy Swede
Master Cruncher
Svíþjóð
Joined: Apr 10, 2020
Post Count: 2220
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Invalid GPU work units

I've set my hosts to download tasks but not run them. I then look through them for any _3 and _4 resends. If there are invalids in the first 3 that are sent out I just abort the the _3 and _4 tasks. It also gives a chance to see if any of the _0 or_1 tasks have already returned an invalid and the server has issued a resend. If the resends are due to errors then Ill let mine run. It's a lot of micro managing but I don't see much point in running a task again that already has numerous invalids. JMHO

Just got a 3436 _3 with 2 invalids. CYA!
And now a 3465 _2 with invalids

Yeah, good idea. I'm doing the same. No need to crunch tasks that goes to invalid instantly. This does not look good at all for the moment....

Edit: And another good thing about downloading them, but not running them, is that they will become Server Aborted automatically, if there are too many invalid wingmen. So, not much micro managing needed at all smile
----------------------------------------
[Edit 1 times, last edit by Grumpy Swede at Apr 16, 2021 12:30:27 AM]
[Apr 16, 2021 12:25:13 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Richard Haselgrove
Senior Cruncher
United Kingdom
Joined: Feb 19, 2021
Post Count: 360
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Invalid GPU work units

Just seen one of my Linux/NVidia machines catch a total of 24 tasks in two consecutive fetches. Most of them were early replications - _0 or _1, and most were of the "short running, faulty batch" type. I'm not worried, or complaining - I run a moderately powerful machine, precisely so I can flush these tasks through the system as quickly as possible, and we can move on to the task groupings where the extra power of GPUs will make the most difference. Einstein lost about 15 minutes of crunching, but I think they can cope with that.

But it prompted a couple of thoughts.
(1) Every single one of these tasks that I spot-checked was paired with other Linux machines. I spot-checked a Windows machine, and every single wingmate there was - another Windows machine. We know (Uplinger has told us) that a task requiring confirmation is flagged to need the same class of GPU: it appears that applies to OS, as well. Coming from the SETI stable, where we struggled mightily to ensure that every version - CPU or GP; NV, AMD or intel; Windows, Linux or Mac; stock, optimised, or third-party - produced compatible and validateable results, these isolated 'bubbles' of validation feel very strange. I do hope that cross-bubble verification is being performed elsewhere in the system.
(2) This project allows 'reliable' hosts to be trusted to report valid results without verification by replication. If the majority of hosts satisfy the 'reliable' condition, that will increase project efficiency massively. But does the occasional rogue 'bad batch' affect the reliability rating of a host? That would slow down progress again until the effects are flushed out of the system.
[Apr 16, 2021 10:29:00 AM]   Link   Report threatening or abusive post: please login first  Go to top 
sam6861
Advanced Cruncher
Joined: Mar 31, 2020
Post Count: 107
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Invalid GPU work units

Here are work units with 2 valid and some invalids. All windows 10 or 7.
OPNG_0002340_00266_3 AMD: 2 valid (GCN5 gfx906, GCN5 gfx902), 2 invalid (GCN4 Ellesmere, RDNA gfx1010), server abort on me.
OPNG_0002337_00249_2 NVidia: 2 valid (Volta: Quadro T2000, GTX 1660 SUPER), 2 invalid (Pascal: GTX 1060 6GB, my GT 1030).
OPNG_0002357_00300_3 AMD: 2 valid (GCN1 Tahiti, GCN5 gfx906), 2 invalid (GCN4 Bristol Ridge, GCN4 Ellesmere), server abort on me.
OPNG_0003043_00497_2 AMD: 2 valid (RDNA gfx1010, my RDNA gfx1012), 3 invalid (GCN4 Ellesmere, GCN4 Ellesmere, GCN1 Capeverde)

1 valid, 2 invalids.
OPNG_0002462_00189_1 AMD: 1 valid (my RDNA gfx1012), 2 invalid (GCN1 Capeverde, GCN4 gfx804)

"Too Late", too many invalids. All Windows.
https://www.worldcommunitygrid.org/ms/device/...s.do?workunitId=622152832
OPNG_0002976_00177_4 AMD: My Too Late (RDNA gfx1012), 3 invalid (GCN3 Iceland, GCN5 gfx906, RDNA gfx1010), 1 server aborted
OPNG_0002324_00156_2 NVidia: 1 Too Late (RTX 3070), 3 invalid (RTX 2080 Ti, GTX 1080, GTX 1050), server abort on me.
OPNG_0002324_00067_2 NVidia: 1 Too Late (GTX 1070), 3 invalid (GTX 1660 SUPER, GTX 1660 Ti, GT 1030), server abort on me.
https://www.worldcommunitygrid.org/ms/device/...s.do?workunitId=620380537

There are plenty with no valid, only invalid and server abort. Some Windows, some Linux.
OPNG_0003043_00113_4 OPNG_0003043_00110_4 OPNG_0002976_00535_2
OPNG_0002636_00045_4 OPNG_0002614_00109_4 OPNG_0002614_00282_0
OPNG_0002423_00359_3 OPNG_0002333_00184_0 OPNG_0002330_00065_3
OPNG_0002322_00154_3 OPNG_0002324_00250_3 OPNG_0002324_00233_1
OPNG_0002324_00121_1 OPNG_0002245_00195_3

For work unit with some invalids, I am unsure if some GPU make random errors, or something wrong with specific GPU architecture (NVidia Pascal vs Volta) (AMD GCN4 vs RDNA) on some work units.

For "Too Late" work units, I wonder why "Server aborted" on a resend, when it could have given 1 more chance to validate and verify?
[Apr 16, 2021 10:42:07 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Jorlin
Advanced Cruncher
Deutschland
Joined: Jan 22, 2020
Post Count: 90
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Invalid GPU work units

This project allows 'reliable' hosts to be trusted to report valid results without verification by replication. If the majority of hosts satisfy the 'reliable' condition, that will increase project efficiency massively. But does the occasional rogue 'bad batch' affect the reliability rating of a host? That would slow down progress again until the effects are flushed out of the system.


Just checked. The last time a OPNG (_0) task went through without a wingman was 8 hours ago. After that i always got a wingman assigned.
Guess im unreliable now...
----------------------------------------

----------------------------------------
[Edit 1 times, last edit by Jorlin at Apr 16, 2021 10:58:52 AM]
[Apr 16, 2021 10:46:22 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Grumpy Swede
Master Cruncher
Svíþjóð
Joined: Apr 10, 2020
Post Count: 2220
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Invalid GPU work units

Well, as one infamous person said "It is what it is".
I'm letting my main computer rest, until this whatever it is, is fixed.
[Apr 16, 2021 2:44:40 PM]   Link   Report threatening or abusive post: please login first  Go to top 
widdershins
Veteran Cruncher
Scotland
Joined: Apr 30, 2007
Post Count: 674
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Invalid GPU work units

This project allows 'reliable' hosts to be trusted to report valid results without verification by replication. If the majority of hosts satisfy the 'reliable' condition, that will increase project efficiency massively. But does the occasional rogue 'bad batch' affect the reliability rating of a host? That would slow down progress again until the effects are flushed out of the system.


Just checked. The last time a OPNG (_0) task went through without a wingman was 8 hours ago. After that i always got a wingman assigned.
Guess im unreliable now...


I had a valid without a wingman this afternoon, but was back to requiring a wingman right after that as the next few units were from the bad batch. Not worried though as the machine is running 24 MCM units simultaneously on the CPU in between any OPNG units it snags, and all those MCM units return as valids, so the machine will be back to having a reliable reputation within an hour or so of the bad units flushing through.
[Apr 16, 2021 4:01:13 PM]   Link   Report threatening or abusive post: please login first  Go to top 
zdnko
Senior Cruncher
Joined: Dec 1, 2005
Post Count: 229
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Invalid GPU work units

Someone can explain me how works the Too Late status?

https://www.worldcommunitygrid.org/ms/device/...s.do?workunitId=620639614
Result Name OS type OS version App Version Number Status Sent Time Time Due /
Return Time CPU Time / Elapsed Time (hours) Claimed/ Granted BOINC Credit
OPNG_ 0002331_ 00101_ 3-- Microsoft Windows 10 Core x64 Edition, (10.00.19042.00) 728 Invalid 4/16/21 15:48:11 4/16/21 15:50:15 0.01 0.3 / 0.0
OPNG_ 0002331_ 00101_ 4-- Microsoft Windows 10 Professional x64 Edition, (10.00.19042.00) 728 Invalid 4/16/21 15:48:11 4/16/21 15:50:14 0.02 1.3 / 0.0
OPNG_ 0002331_ 00101_ 2-- Microsoft Windows 10 Core x64 Edition, (10.00.19041.00) 728 Too Late 4/16/21 15:41:12 4/16/21 15:48:03 0.06 0.1 / 0.0
OPNG_ 0002331_ 00101_ 1-- Microsoft Windows 10 Professional x64 Edition, (10.00.19042.00) 728 Invalid 4/16/21 15:41:06 4/16/21 15:43:12 0.01 0.2 / 0.0
OPNG_ 0002331_ 00101_ 0-- Linux Ubuntu Ubuntu 20.04 LTS [5.4.0-71-generic|libc 2.31 (Ubuntu GLIBC 2.31-0ubuntu9)] 728 User Aborted 4/15/21 06:40:03 4/16/21 15:41:04 0.00 0.0 / 0.0

0002331_ 00101_ 2
Sent Time: 15:41:12
Return Time: 15:48:03
Less of 7 minutes!

After a few seconds the wu was sent 2 more times.

What Causes Too Late?
If my Return Time is Too Late why was the wu sent again?
[Apr 16, 2021 5:51:56 PM]   Link   Report threatening or abusive post: please login first  Go to top 
nanoprobe
Master Cruncher
Classified
Joined: Aug 29, 2008
Post Count: 2998
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Invalid GPU work units

Someone can explain me how works the Too Late status?

https://www.worldcommunitygrid.org/ms/device/...s.do?workunitId=620639614
Result Name OS type OS version App Version Number Status Sent Time Time Due /
Return Time CPU Time / Elapsed Time (hours) Claimed/ Granted BOINC Credit
OPNG_ 0002331_ 00101_ 3-- Microsoft Windows 10 Core x64 Edition, (10.00.19042.00) 728 Invalid 4/16/21 15:48:11 4/16/21 15:50:15 0.01 0.3 / 0.0
OPNG_ 0002331_ 00101_ 4-- Microsoft Windows 10 Professional x64 Edition, (10.00.19042.00) 728 Invalid 4/16/21 15:48:11 4/16/21 15:50:14 0.02 1.3 / 0.0
OPNG_ 0002331_ 00101_ 2-- Microsoft Windows 10 Core x64 Edition, (10.00.19041.00) 728 Too Late 4/16/21 15:41:12 4/16/21 15:48:03 0.06 0.1 / 0.0
OPNG_ 0002331_ 00101_ 1-- Microsoft Windows 10 Professional x64 Edition, (10.00.19042.00) 728 Invalid 4/16/21 15:41:06 4/16/21 15:43:12 0.01 0.2 / 0.0
OPNG_ 0002331_ 00101_ 0-- Linux Ubuntu Ubuntu 20.04 LTS [5.4.0-71-generic|libc 2.31 (Ubuntu GLIBC 2.31-0ubuntu9)] 728 User Aborted 4/15/21 06:40:03 4/16/21 15:41:04 0.00 0.0 / 0.0

0002331_ 00101_ 2
Sent Time: 15:41:12
Return Time: 15:48:03
Less of 7 minutes!

After a few seconds the wu was sent 2 more times.

What Causes Too Late?
If my Return Time is Too Late why was the wu sent again?

That usually happens when the server has sent the "server abort" command to a task that has already started. The command from the server was "Too Late" to abort the task.
----------------------------------------
In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.


----------------------------------------
[Edit 1 times, last edit by nanoprobe at Apr 16, 2021 6:03:14 PM]
[Apr 16, 2021 6:01:57 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Grumpy Swede
Master Cruncher
Svíþjóð
Joined: Apr 10, 2020
Post Count: 2220
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Invalid GPU work units

OK, I give up on the GPU crunching here, until the "invalid" crap is solved. My reliable (no longer) but slow GPU, spends normally an hour or so on tasks, much shorter on these crap batches of course, but there's a total waste of time since most of the batches I get now, end up as invalid. All my latest _0 (Yup reliable and no wingman) ended as invalid. Total waste of time and energy.

I'll also stop trying with my faster GTX980/iGPU HD4600 computer. Why waste time and energy on stuff that immediately goes invalid, or becomes server aborted?

So, again: It's not only batches 2225-3336 that has the issue, but almost all, if not all of the more recent batches.

I'll save on electricity until this is fixed.
----------------------------------------
[Edit 2 times, last edit by Grumpy Swede at Apr 16, 2021 6:07:51 PM]
[Apr 16, 2021 6:04:44 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 171   Pages: 18   [ Previous Page | 1 2 3 4 5 6 7 8 9 10 | Next Page ]
[ Jump to Last Post ]
Post new Thread