Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 171
|
![]() |
Author |
|
nanoprobe
Master Cruncher Classified Joined: Aug 29, 2008 Post Count: 2998 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I've set my hosts to download tasks but not run them. I then look through them for any _3 and _4 resends. If there are invalids in the first 3 that are sent out I just abort the the _3 and _4 tasks. It also gives a chance to see if any of the _0 or_1 tasks have already returned an invalid and the server has issued a resend. If the resends are due to errors then Ill let mine run. It's a lot of micro managing but I don't see much point in running a task again that already has numerous invalids. JMHO
----------------------------------------Just got a 3436 _3 with 2 invalids. CYA! And now a 3465 _2 with invalids
In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.
----------------------------------------![]() ![]() [Edit 4 times, last edit by nanoprobe at Apr 16, 2021 12:16:57 AM] |
||
|
Grumpy Swede
Master Cruncher Svíþjóð Joined: Apr 10, 2020 Post Count: 2220 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I've set my hosts to download tasks but not run them. I then look through them for any _3 and _4 resends. If there are invalids in the first 3 that are sent out I just abort the the _3 and _4 tasks. It also gives a chance to see if any of the _0 or_1 tasks have already returned an invalid and the server has issued a resend. If the resends are due to errors then Ill let mine run. It's a lot of micro managing but I don't see much point in running a task again that already has numerous invalids. JMHO Just got a 3436 _3 with 2 invalids. CYA! And now a 3465 _2 with invalids Yeah, good idea. I'm doing the same. No need to crunch tasks that goes to invalid instantly. This does not look good at all for the moment.... Edit: And another good thing about downloading them, but not running them, is that they will become Server Aborted automatically, if there are too many invalid wingmen. So, not much micro managing needed at all ![]() [Edit 1 times, last edit by Grumpy Swede at Apr 16, 2021 12:30:27 AM] |
||
|
Richard Haselgrove
Senior Cruncher United Kingdom Joined: Feb 19, 2021 Post Count: 360 Status: Offline Project Badges: ![]() ![]() |
Just seen one of my Linux/NVidia machines catch a total of 24 tasks in two consecutive fetches. Most of them were early replications - _0 or _1, and most were of the "short running, faulty batch" type. I'm not worried, or complaining - I run a moderately powerful machine, precisely so I can flush these tasks through the system as quickly as possible, and we can move on to the task groupings where the extra power of GPUs will make the most difference. Einstein lost about 15 minutes of crunching, but I think they can cope with that.
But it prompted a couple of thoughts. (1) Every single one of these tasks that I spot-checked was paired with other Linux machines. I spot-checked a Windows machine, and every single wingmate there was - another Windows machine. We know (Uplinger has told us) that a task requiring confirmation is flagged to need the same class of GPU: it appears that applies to OS, as well. Coming from the SETI stable, where we struggled mightily to ensure that every version - CPU or GP; NV, AMD or intel; Windows, Linux or Mac; stock, optimised, or third-party - produced compatible and validateable results, these isolated 'bubbles' of validation feel very strange. I do hope that cross-bubble verification is being performed elsewhere in the system. (2) This project allows 'reliable' hosts to be trusted to report valid results without verification by replication. If the majority of hosts satisfy the 'reliable' condition, that will increase project efficiency massively. But does the occasional rogue 'bad batch' affect the reliability rating of a host? That would slow down progress again until the effects are flushed out of the system. |
||
|
sam6861
Advanced Cruncher Joined: Mar 31, 2020 Post Count: 107 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Here are work units with 2 valid and some invalids. All windows 10 or 7.
OPNG_0002340_00266_3 AMD: 2 valid (GCN5 gfx906, GCN5 gfx902), 2 invalid (GCN4 Ellesmere, RDNA gfx1010), server abort on me. OPNG_0002337_00249_2 NVidia: 2 valid (Volta: Quadro T2000, GTX 1660 SUPER), 2 invalid (Pascal: GTX 1060 6GB, my GT 1030). OPNG_0002357_00300_3 AMD: 2 valid (GCN1 Tahiti, GCN5 gfx906), 2 invalid (GCN4 Bristol Ridge, GCN4 Ellesmere), server abort on me. OPNG_0003043_00497_2 AMD: 2 valid (RDNA gfx1010, my RDNA gfx1012), 3 invalid (GCN4 Ellesmere, GCN4 Ellesmere, GCN1 Capeverde) 1 valid, 2 invalids. OPNG_0002462_00189_1 AMD: 1 valid (my RDNA gfx1012), 2 invalid (GCN1 Capeverde, GCN4 gfx804) "Too Late", too many invalids. All Windows. https://www.worldcommunitygrid.org/ms/device/...s.do?workunitId=622152832 OPNG_0002976_00177_4 AMD: My Too Late (RDNA gfx1012), 3 invalid (GCN3 Iceland, GCN5 gfx906, RDNA gfx1010), 1 server aborted OPNG_0002324_00156_2 NVidia: 1 Too Late (RTX 3070), 3 invalid (RTX 2080 Ti, GTX 1080, GTX 1050), server abort on me. OPNG_0002324_00067_2 NVidia: 1 Too Late (GTX 1070), 3 invalid (GTX 1660 SUPER, GTX 1660 Ti, GT 1030), server abort on me. https://www.worldcommunitygrid.org/ms/device/...s.do?workunitId=620380537 There are plenty with no valid, only invalid and server abort. Some Windows, some Linux. OPNG_0003043_00113_4 OPNG_0003043_00110_4 OPNG_0002976_00535_2 OPNG_0002636_00045_4 OPNG_0002614_00109_4 OPNG_0002614_00282_0 OPNG_0002423_00359_3 OPNG_0002333_00184_0 OPNG_0002330_00065_3 OPNG_0002322_00154_3 OPNG_0002324_00250_3 OPNG_0002324_00233_1 OPNG_0002324_00121_1 OPNG_0002245_00195_3 For work unit with some invalids, I am unsure if some GPU make random errors, or something wrong with specific GPU architecture (NVidia Pascal vs Volta) (AMD GCN4 vs RDNA) on some work units. For "Too Late" work units, I wonder why "Server aborted" on a resend, when it could have given 1 more chance to validate and verify? |
||
|
Jorlin
Advanced Cruncher Deutschland Joined: Jan 22, 2020 Post Count: 90 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
This project allows 'reliable' hosts to be trusted to report valid results without verification by replication. If the majority of hosts satisfy the 'reliable' condition, that will increase project efficiency massively. But does the occasional rogue 'bad batch' affect the reliability rating of a host? That would slow down progress again until the effects are flushed out of the system. Just checked. The last time a OPNG (_0) task went through without a wingman was 8 hours ago. After that i always got a wingman assigned. Guess im unreliable now... ![]() [Edit 1 times, last edit by Jorlin at Apr 16, 2021 10:58:52 AM] |
||
|
Grumpy Swede
Master Cruncher Svíþjóð Joined: Apr 10, 2020 Post Count: 2220 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Well, as one infamous person said "It is what it is".
I'm letting my main computer rest, until this whatever it is, is fixed. |
||
|
widdershins
Veteran Cruncher Scotland Joined: Apr 30, 2007 Post Count: 674 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
This project allows 'reliable' hosts to be trusted to report valid results without verification by replication. If the majority of hosts satisfy the 'reliable' condition, that will increase project efficiency massively. But does the occasional rogue 'bad batch' affect the reliability rating of a host? That would slow down progress again until the effects are flushed out of the system. Just checked. The last time a OPNG (_0) task went through without a wingman was 8 hours ago. After that i always got a wingman assigned. Guess im unreliable now... I had a valid without a wingman this afternoon, but was back to requiring a wingman right after that as the next few units were from the bad batch. Not worried though as the machine is running 24 MCM units simultaneously on the CPU in between any OPNG units it snags, and all those MCM units return as valids, so the machine will be back to having a reliable reputation within an hour or so of the bad units flushing through. |
||
|
zdnko
Senior Cruncher Joined: Dec 1, 2005 Post Count: 229 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Someone can explain me how works the Too Late status?
https://www.worldcommunitygrid.org/ms/device/...s.do?workunitId=620639614 Result Name OS type OS version App Version Number Status Sent Time Time Due / Return Time CPU Time / Elapsed Time (hours) Claimed/ Granted BOINC Credit OPNG_ 0002331_ 00101_ 3-- Microsoft Windows 10 Core x64 Edition, (10.00.19042.00) 728 Invalid 4/16/21 15:48:11 4/16/21 15:50:15 0.01 0.3 / 0.0 OPNG_ 0002331_ 00101_ 4-- Microsoft Windows 10 Professional x64 Edition, (10.00.19042.00) 728 Invalid 4/16/21 15:48:11 4/16/21 15:50:14 0.02 1.3 / 0.0 OPNG_ 0002331_ 00101_ 2-- Microsoft Windows 10 Core x64 Edition, (10.00.19041.00) 728 Too Late 4/16/21 15:41:12 4/16/21 15:48:03 0.06 0.1 / 0.0 OPNG_ 0002331_ 00101_ 1-- Microsoft Windows 10 Professional x64 Edition, (10.00.19042.00) 728 Invalid 4/16/21 15:41:06 4/16/21 15:43:12 0.01 0.2 / 0.0 OPNG_ 0002331_ 00101_ 0-- Linux Ubuntu Ubuntu 20.04 LTS [5.4.0-71-generic|libc 2.31 (Ubuntu GLIBC 2.31-0ubuntu9)] 728 User Aborted 4/15/21 06:40:03 4/16/21 15:41:04 0.00 0.0 / 0.0 0002331_ 00101_ 2 Sent Time: 15:41:12 Return Time: 15:48:03 Less of 7 minutes! After a few seconds the wu was sent 2 more times. What Causes Too Late? If my Return Time is Too Late why was the wu sent again? |
||
|
nanoprobe
Master Cruncher Classified Joined: Aug 29, 2008 Post Count: 2998 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Someone can explain me how works the Too Late status? https://www.worldcommunitygrid.org/ms/device/...s.do?workunitId=620639614 Result Name OS type OS version App Version Number Status Sent Time Time Due / Return Time CPU Time / Elapsed Time (hours) Claimed/ Granted BOINC Credit OPNG_ 0002331_ 00101_ 3-- Microsoft Windows 10 Core x64 Edition, (10.00.19042.00) 728 Invalid 4/16/21 15:48:11 4/16/21 15:50:15 0.01 0.3 / 0.0 OPNG_ 0002331_ 00101_ 4-- Microsoft Windows 10 Professional x64 Edition, (10.00.19042.00) 728 Invalid 4/16/21 15:48:11 4/16/21 15:50:14 0.02 1.3 / 0.0 OPNG_ 0002331_ 00101_ 2-- Microsoft Windows 10 Core x64 Edition, (10.00.19041.00) 728 Too Late 4/16/21 15:41:12 4/16/21 15:48:03 0.06 0.1 / 0.0 OPNG_ 0002331_ 00101_ 1-- Microsoft Windows 10 Professional x64 Edition, (10.00.19042.00) 728 Invalid 4/16/21 15:41:06 4/16/21 15:43:12 0.01 0.2 / 0.0 OPNG_ 0002331_ 00101_ 0-- Linux Ubuntu Ubuntu 20.04 LTS [5.4.0-71-generic|libc 2.31 (Ubuntu GLIBC 2.31-0ubuntu9)] 728 User Aborted 4/15/21 06:40:03 4/16/21 15:41:04 0.00 0.0 / 0.0 0002331_ 00101_ 2 Sent Time: 15:41:12 Return Time: 15:48:03 Less of 7 minutes! After a few seconds the wu was sent 2 more times. What Causes Too Late? If my Return Time is Too Late why was the wu sent again? That usually happens when the server has sent the "server abort" command to a task that has already started. The command from the server was "Too Late" to abort the task.
In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.
----------------------------------------![]() ![]() [Edit 1 times, last edit by nanoprobe at Apr 16, 2021 6:03:14 PM] |
||
|
Grumpy Swede
Master Cruncher Svíþjóð Joined: Apr 10, 2020 Post Count: 2220 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
OK, I give up on the GPU crunching here, until the "invalid" crap is solved. My reliable (no longer) but slow GPU, spends normally an hour or so on tasks, much shorter on these crap batches of course, but there's a total waste of time since most of the batches I get now, end up as invalid. All my latest _0 (Yup reliable and no wingman) ended as invalid. Total waste of time and energy.
----------------------------------------I'll also stop trying with my faster GTX980/iGPU HD4600 computer. Why waste time and energy on stuff that immediately goes invalid, or becomes server aborted? So, again: It's not only batches 2225-3336 that has the issue, but almost all, if not all of the more recent batches. I'll save on electricity until this is fixed. [Edit 2 times, last edit by Grumpy Swede at Apr 16, 2021 6:07:51 PM] |
||
|
|
![]() |