| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 48
|
|
| Author |
|
|
Crystal Pellet
Veteran Cruncher Joined: May 21, 2008 Post Count: 1404 Status: Offline Project Badges:
|
The 3 tasks of batch OPNG_0086596,
I still had in queue 'Ready to start', were aborted by the server, so someone discovered/declared that as a bad batch. |
||
|
|
erich56
Senior Cruncher Austria Joined: Feb 24, 2007 Post Count: 300 Status: Offline Project Badges:
|
The 3 tasks of batch OPNG_0086596, same hereI still had in queue 'Ready to start', were aborted by the server, so someone discovered/declared that as a bad batch. |
||
|
|
Richard Haselgrove
Senior Cruncher United Kingdom Joined: Feb 19, 2021 Post Count: 360 Status: Offline Project Badges:
|
The 3 tasks of batch OPNG_0086596, I still had in queue 'Ready to start', were aborted by the server, so someone discovered/declared that as a bad batch. I'm not so sure. I have something like 70 errors in my list - most recent at 8/19/21 10:39:23 UTC. I've also got just four 'aborted by server', but they were earlier - most recent yesterday, at 8/18/21 13:17:11 UTC. The server only cancelled replications _3 or _4 - only issued when the first three tasks had failed. I think the server cancels the WU automatically when 4 out of 5 have failed: I don't see any sign of a mass cancellation, and there's no other category in the filter list which seems to match that possibility. |
||
|
|
Crystal Pellet
Veteran Cruncher Joined: May 21, 2008 Post Count: 1404 Status: Offline Project Badges:
|
Ok Richard, I should have had a closer look, so I did now. The results were not all still on my list.
----------------------------------------Here are 3 status(UK) / statuses(US) of the workunits of three server aborted tasks. Not exactly the same three as mentioned before. Still one of the mentioned batch and 2 of batch 0086596. Minimum quorum 1: OPNG_ 0086596_ 00013_ 1-- Server Aborted 18-8-21 11:26:06 19-8-21 13:43:23 0,00 0,0 / 0,0 OPNG_ 0086596_ 00013_ 0-- Error 18-8-21 09:03:51 18-8-21 11:26:01 0,00 0,0 / 0,0 Minimum quorum 1: OPNG_ 0086596_ 00016_ 1-- Server Aborted 18-8-21 11:26:06 19-8-21 13:43:23 0,00 0,0 / 0,0 OPNG_ 0086596_ 00016_ 0-- Error 18-8-21 09:03:51 18-8-21 11:26:01 0,00 0,0 / 0,0 Minimum quorum 2: OPNG_ 0086586_ 00093_ 1-- Server Aborted 18-8-21 15:07:49 19-8-21 13:43:23 0,00 0,0 / 0,0 OPNG_ 0086586_ 00093_ 0-- Server Aborted 18-8-21 15:07:47 19-8-21 13:08:40 0,00 0,0 / 0,0 [Edit 2 times, last edit by Crystal Pellet at Aug 20, 2021 6:01:28 AM] |
||
|
|
Richard Haselgrove
Senior Cruncher United Kingdom Joined: Feb 19, 2021 Post Count: 360 Status: Offline Project Badges:
|
Mine were Minimum Quorum: 2, Replication: 2, from batches 0086577 and 0086579.
That does suggest some attempt to remove them from the system, but I don't think we have any evidence that they have acted on a wider scale. Maybe only a few tasks from each batch were affected - that would make the situation far harder to unpick. |
||
|
|
Richard Haselgrove
Senior Cruncher United Kingdom Joined: Feb 19, 2021 Post Count: 360 Status: Offline Project Badges:
|
Quoting my own post, but with an updated list of batch numbers. These include some more from my own account, plus all the extras which have been posted in this thread so far.
OPNG_0086575 Not quite every batch in the range 86575 - 86603, but most of them. I haven't had any new failures for over 24 hours, so hopefully we're over this particular glitch.OPNG_0086576 OPNG_0086577 OPNG_0086578 OPNG_0086579 OPNG_0086580 OPNG_0086583 OPNG_0086584 OPNG_0086585 OPNG_0086586 OPNG_0086587 OPNG_0086588 OPNG_0086589 OPNG_0086591 OPNG_0086592 OPNG_0086593 OPNG_0086594 OPNG_0086595 OPNG_0086596 OPNG_0086598 OPNG_0086599 OPNG_0086600 OPNG_0086601 OPNG_0086602 OPNG_0086603 |
||
|
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges:
|
My apologies for not chiming in earlier. You are indeed correct that all of the workunits in the batches OPNG_0086575 through OPNG_0086604 failed. Once I observed the issue, I had the server stop sending out any more from that set. There was a mis-configuration in the inputs that is being corrected and will then be sent out with new batch numbers.
We apologize for the issue and you shouldn't see any more tasks from these batches. |
||
|
|
biini
Senior Cruncher Finland Joined: Jan 25, 2007 Post Count: 334 Status: Offline Project Badges:
|
Not yet errors, but WUs
----------------------------------------0078688_00043 0078688_00028 0078688_00033 stuck for 16 hours on AMD.. or could be my amd just acting up |
||
|
|
|