| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 567
|
|
| Author |
|
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 2173 Status: Offline Project Badges:
|
Unixchick said: At this point, I think it is not inconceivable that this is a intermittent, lingering effect of the general Internet outage caused by Cloudflare, which started shortly after 12:00 UTC...My queue is short so I can spot issues... My queue isn't refilling at a one in one out rate. This could be a short term issue that my short queue makes more noticeable, or I'm an early warning canary I think the system has [mostly] been in "trying to clear out pending retries" mode since around 14:50 UTC today. Since then I've managed to get a whole 3 (yes, three!) retries and nothing else :-(Given that every scheduler request gets the "other platforms" message I have to wonder what the bottleneck is, and whether it is specific to Linux users or not. Definitive answer could only be giving by Dylan (or Igor), and they have been silent again for a week... Ralf |
||
|
|
Boca Raton Community HS
Senior Cruncher Joined: Aug 27, 2021 Post Count: 209 Status: Offline Project Badges:
|
Yes, I spoke too soon, sorry about that.
----------------------------------------Our system with the smallest cache just ran dry. I don't think we have any systems that store more than one day of work, just for clarity. Reply to the above post from TPCBF: that is a really good point about Cloudflare. Didn't think of that. Additional edit: if anyone has a specific way that we (Boca Raton HS) can help sift through our results to look for something/anything, just let us know. I have a group of students that is ready to help at any point. [Edit 2 times, last edit by Boca Raton Community HS at Nov 18, 2025 7:09:46 PM] |
||
|
|
Unixchick
Veteran Cruncher Joined: Apr 16, 2020 Post Count: 1293 Status: Offline Project Badges:
|
I'm getting fresh MCM WUs now.
|
||
|
|
Grumpy Swede
Master Cruncher Svíþjóð Joined: Apr 10, 2020 Post Count: 2493 Status: Offline Project Badges:
|
I'm getting fresh MCM WUs now. Same here Unixchick. A whole bunch of them filled my cache to my max setting.[Edit 1 times, last edit by Grumpy Swede at Nov 19, 2025 2:23:54 AM] |
||
|
|
MJH333
Senior Cruncher England Joined: Apr 3, 2021 Post Count: 300 Status: Offline Project Badges:
|
This is something I haven’t seen very often … I’m currently processing MCM1_9999999_0017_9 .
See workunit 764547366. Cheers, Mark |
||
|
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 2173 Status: Offline Project Badges:
|
This is something I haven’t seen very often … I’m currently processing MCM1_9999999_0017_9 . Well, those_9999999_ WUs were likely tests units only to begin with, so anything can happen with those IMHO....See workunit 764547366. Cheers, Mark Ralf |
||
|
|
MJH333
Senior Cruncher England Joined: Apr 3, 2021 Post Count: 300 Status: Offline Project Badges:
|
This is something I haven’t seen very often … I’m currently processing MCM1_9999999_0017_9 . Well, those_9999999_ WUs were likely tests units only to begin with, so anything can happen with those IMHO....See workunit 764547366. Cheers, Mark Ralf My event log contains the following message for that task: "7941 World Community Grid 19/11/2025 16:54:45 [error] State file error: missing file 7942 World Community Grid 19/11/2025 16:54:45 [error] State file error: missing input file 7943 World Community Grid 19/11/2025 16:54:45 [error] Can't handle task MCM1_9999999_0017 in scheduler reply 7944 World Community Grid 19/11/2025 16:54:45 [error] State file error: missing task MCM1_9999999_0017 7945 World Community Grid 19/11/2025 16:54:45 [error] Can't handle task MCM1_9999999_0017_9 in scheduler reply" That presumably explains why it has gone through so many iterations without success. Cheers, Mark |
||
|
|
Boca Raton Community HS
Senior Cruncher Joined: Aug 27, 2021 Post Count: 209 Status: Offline Project Badges:
|
That work unit is destined for failure! Is there a limit before the server would terminate the work unit?
----------------------------------------[Edit 1 times, last edit by Boca Raton Community HS at Nov 19, 2025 7:07:06 PM] |
||
|
|
Unixchick
Veteran Cruncher Joined: Apr 16, 2020 Post Count: 1293 Status: Offline Project Badges:
|
Just got some weird errors
----------------------------------------https://www.worldcommunitygrid.org/contribution/workunit/776764589 I'm going to assume a bad batch. I got 3 in a row. now I've got working WUs [Edit 1 times, last edit by Unixchick at Nov 19, 2025 9:14:24 PM] |
||
|
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 1316 Status: Recently Active Project Badges:
|
Unixchick,
----------------------------------------Those appear to be download errors - permanent HTTP error. Some days that seems to happen quite a few times mid-afternoon to early evening UTC time. If the ones I'm getting are typical, it appears that one of the nodes has some sort of issue for a few minutes (as my failures at a given time all seem to be from fanouts in the same group). I can only comment on tasks I've seen fail, as the fanout data doesn't appear anywhere in the stuff the APIs let us look at (so I wrote a script to monitor scheduler replies, where the information is available ). I only have reliable data from 2025-11-14 onwards, though there were a lot of such errors before that day, some of them discussed in this thread during the first week of November, including Dylan mentioning a possible reason and tentative fix.On 2025-11-14 I saw errors around 14:30 and 15:40 (UTC) that seemed to be from the third group and errors around 16:40 that seemed to be from the second group. I then had four error-free days, but today (2025-11-19) errors at around 14:55 seemed to be from the second group and errors at around 16:10 seemed to be from the fourth group. Given that these errors seem to show up during their working day I wonder if some development or diagnostic work is causing this, or whether it's a periodic issue we're going to have to get used to... Note that in all cases, other users have managed to successfully download tasks for the WUs in question, so it's presumably not something corrupt on the download server(s). ADDENDUM: I didn't check again until early on 2025-11-20, and it turned out there had been more largish batches of errors on 2025-11-19 at around 18:00 (fourth group), 18:15 (sixth group) and 18:45 (fourth group again). Not good... Cheers - Al. P.S. I think the master file is served from the second group... [Edited for the addendum and the note about successful downloads] [Edit 3 times, last edit by alanb1951 at Nov 20, 2025 3:51:01 AM] |
||
|
|
|