| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 35
|
|
| Author |
|
|
MarkH
Advanced Cruncher United States of America Joined: May 16, 2020 Post Count: 66 Status: Offline Project Badges:
|
Thanks to everyone who replied; since my message I've run a few dozen ARP's without incident. But I didn't change anything with the computer or software. Maybe I got hit by cosmic rays on the failed ones.
----------------------------------------
That science of the people, by the people, for the people, shall not perish from the Earth.
|
||
|
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 1316 Status: Offline Project Badges:
|
I've just checked my latest returns, and I note that I now have three tasks with Validation errors. I doubt it's an issue specific to my system, as wingmen seem to be having the same problem! I can confirm that my errors were "Validate error" and the others show the same characteristics -- an apparently viable result with an unexplained Error status -- so they presumably suffered the same fate.
Here are the WU IDs, names and the current status for my three cases: WU 750849345 -- ARP1_0001267_150 Elsewhere, adriverhoef has reported similar for three different WUs, so there's something odd going on somewhere... Cheers - Al. |
||
|
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2346 Status: Offline Project Badges:
|
So far today (and the day isn't over yet),
----------------------------------------workunit 750700147 ARP1_0000393_150_0 Linux Ubuntu Error 2025-07-31T19:28:27 2025-08-02T12:17:21 10.48/10.50workunit 750884956 ARP1_0033794_149_0 Linux Ubuntu Error 2025-08-01T03:43:39 2025-08-02T01:55:49 11.20/11.27workunit 750387046 ARP1_0031990_149_0 Linux Ubuntu Error 2025-07-31T06:00:41 2025-08-01T19:22:43 6.84/6.87workunit 750849353 ARP1_0033547_149_0 Linux Ubuntu Error 2025-08-01T02:10:21 2025-08-01T14:58:51 8.56/8.57workunit 750472354 ARP1_0033325_149_0 Linux Debian Error 2025-07-31T09:39:01 2025-08-01T03:22:07 7.55/7.55workunit 749756108 ARP1_0030953_149_0 MSWin 11 Error 2025-07-30T04:19:03 2025-08-03T00:51:52 29.69/77.38workunit 750632570 ARP1_0034825_149_0 Linux Ubuntu Error 2025-07-31T16:32:52 2025-08-03T00:55:20 32.03/32.17workunit 750633731 ARP1_0034708_149_0 MSWin 11 Error 2025-07-31T16:41:28 2025-08-01T20:54:01 12.77/19.70workunit 750954220 ARP1_0001770_150_0 MSWin 11 Error 2025-08-01T06:36:16 2025-08-03T00:02:08 14.05/28.38workunit 750442625 ARP1_0033497_149_0 Linux Fedora Error 2025-07-31T08:24:27 2025-08-03T10:20:44 6.53/6.57workunit 751103565 ARP1_0003249_150_0 Linux Debian Error 2025-08-01T13:04:48 2025-08-02T06:10:49 10.17/10.19workunit 750976341 ARP1_0002097_150_0 Linux Debian Error 2025-08-01T07:33:15 2025-08-03T01:57:45 10.98/10.99workunit 750676907 ARP1_0033714_149_0 Linux Ubuntu Error 2025-07-31T18:25:21 2025-08-02T14:43:28 7.92/7.97workunit 750448393 ARP1_0033651_149_0 Linux Fedora Error 2025-07-31T08:39:14 2025-08-03T13:32:52 6.45/6.50workunit 750907223 ARP1_0035489_149_0 Fedora Linux Error 2025-08-01T04:41:15 2025-08-03T13:33:32 6.31/6.35workunit 750407477 ARP1_0033054_149_0 Linux GNOME Error 2025-07-31T06:55:11 2025-08-03T16:31:16 19.07/19.08workunit 750133541 ARP1_0032459_149_0 Darwin Error 2025-07-30T19:30:17 2025-08-02T19:07:57 8.69/9.87workunit 749576011 ARP1_0027763_149_0 Linux Ubuntu Error 2025-07-29T20:43:46 2025-08-03T08:33:16 24.53/24.53workunit 750596009 ARP1_0032219_149_0 Linux Fedora Error 2025-07-31T14:59:47 2025-08-03T23:34:15 6.49/6.54 Adri [Edit 12 times, last edit by adriverhoef at Aug 8, 2025 8:38:51 AM] |
||
|
|
geophi
Advanced Cruncher U.S. Joined: Sep 3, 2007 Post Count: 113 Status: Offline Project Badges:
|
Mine from when my linux PC downloaded a retry yesterday
----------------------------------------WU -- ARP1_0034000_149 _0 and _1 errored (Windows), _2 and _3 errored (Linux), _4 and _5 running (Linux) All errors are validation errors. [Edit 1 times, last edit by geophi at Aug 3, 2025 5:22:50 PM] |
||
|
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 2173 Status: Offline Project Badges:
|
As the MCM1 well has run dry, I noticed an ARP1 _3 resend which immediately errored out, after the initial WUs had resulted in errors after quite some runtime. Got a subsequent _3 resend on another host that immediately started due to the depletion of MCM1 having run out on that one, and that seems to progress normally it seems
----------------------------------------https://www.worldcommunitygrid.org/contribution/workunit/750930364 The error on this quickly aborted WU states "can't get input file"... Ralf [Edit 1 times, last edit by TPCBF at Aug 3, 2025 5:54:35 PM] |
||
|
|
Hans Sveen
Veteran Cruncher Norge Joined: Feb 18, 2008 Post Count: 981 Status: Offline Project Badges:
|
Hi!
----------------------------------------Are Krembil trying out not to use HR, see this wu: ARP1_002459_149 https://www.worldcommunitygrid.org/contribution/workunit/747428315 _3 is mine! And a strange one: ARP1_0025673_149 Link: https://www.worldcommunitygrid.org/contribution/workunit/748243394 I thought that anonymous platform did not work or should not be allowed?? Hans S. [Edit 1 times, last edit by Hans Sveen at Aug 3, 2025 7:59:39 PM] |
||
|
|
MJH333
Senior Cruncher England Joined: Apr 3, 2021 Post Count: 300 Status: Offline Project Badges:
|
I have similar:
https://www.worldcommunitygrid.org/contribution/workunit/750378338 (ARP1_0033245_149). Cheers, Mark |
||
|
|
Unixchick
Veteran Cruncher Joined: Apr 16, 2020 Post Count: 1293 Status: Offline Project Badges:
|
looks like I got an ERROR ARP too...
----------------------------------------https://www.worldcommunitygrid.org/contribution/workunit/750976340 this next one is interesting. It downloaded. started running and then server aborted. https://www.worldcommunitygrid.org/contribution/workunit/749798059 never seen it abort a WU I've started already. I would have aborted it myself honestly as it is weird. [Edit 1 times, last edit by Unixchick at Aug 3, 2025 9:45:05 PM] |
||
|
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 1316 Status: Offline Project Badges:
|
Wow! -- quite a large number of reports, some of which seemed to have had "happy endings" (though I wonder if some of the reported validations can be trusted under the circumstances)...
The catalogue of various mismatched tasks and strange retry issues being listed here and in the "Work available" thread is beginning to look like signs of either database malfunction or corruption, or of some server hardware malfunction. In particular, it would seem really strange to have abandoned HR (heterogenous redundancy) for ARP1 given the strict data match criteria, and most especially allowing what appears to be an Anonymous Platform to run a task! It looks as if retries for some WUs have been force-cancelled because the system has realized that it can't use them (it sends a slightly different Abort notice to the client). I've had one of these (see below) and it would explain what Unixchick reported just above, though it doesn't explain why those retries went out in the first place... Here's the one I saw... WU 749436789 -- ARP1_0028094 _3 reported at 21:40:15 UTC, I asked for _5 at 21:40:48 UTC and it downloaded and started a couple of minutes later. However, the next contact with WCG produced the following three relevant entries in the BOINC log the next time it contacted WCG (this is the Server Abort happening)... Sun 03 Aug 2025 22:45:04 BST | World Community Grid | [error] garbage_collect(); still have active task for acked result ARP1_0028094_149_5; state 9 Note that state 9 is "Waiting to be sent" and state 5 is "Invalid" (which suggests that validation of the two Windows results had happened at that point and it had realized the two Linux retries would be unusable!) I do hope they'll tell us what has been going on, even if only via the Krembil WCG Operational Status tab... Cheers - Al |
||
|
|
geophi
Advanced Cruncher U.S. Joined: Sep 3, 2007 Post Count: 113 Status: Offline Project Badges:
|
Mine from when my linux PC downloaded a retry yesterday WU -- ARP1_0034000_149 _0 and _1 errored (Windows), _2 and _3 errored (Linux), _4 and _5 running (Linux) All errors are validation errors. The last two linux PC to download from this work unit were seemingly arbitrarily awarded with a status of Valid. Who knows what really happened here? (rhetorical question) |
||
|
|
|