Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 8
|
![]() |
Author |
|
spRocket
Senior Cruncher Joined: Mar 25, 2020 Post Count: 273 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() |
I had two work units fail in an interesting way:
----------------------------------------ARP1_0012386_139_0 ARP1_0031760_139_0 On both of these WUs, my cruncher turned in a work unit only for all of the other wingmen to error out before I could finish mine. Checking the logs, both of mine ran to completion. Could these units be triggering some sort of CPU bug, or was I just extremely unlucky that these units hit a bunch of hosts with other problems (out of disk space, memory issues, etc.)? [Edit 1 times, last edit by spRocket at Nov 5, 2024 2:20:00 PM] |
||
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2145 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Your wingmen failed to download some input file(s), spRocket, that's why they errored out.
----------------------------------------The same thing happened to two tasks on my devices: workunit 626535846 ARP1_0030653_139_0 Linux Fedora 2Late 2024-11-04T08:21:25 2024-11-04T18:53:00 7.33/7.39 503.3/0.0Details: --------------------------------------------------------------------------------------------------------------------------------------- ARP1_0030653_139_0 Linux Fedora 2Late 2024-11-04T08:21:25 2024-11-04T18:53:00 7.33/7.39 503.3/0.0 Although, not quite the same (yet?), as the following one still has one wingman In Progress: workunit 626514226 ARP1_0014490_126_0 Fedora Linux 2Late 2024-11-04T07:23:47 2024-11-05T03:10:13 14.39/15.69 632.1/0.0 Adri [Edit 1 times, last edit by adriverhoef at Nov 5, 2024 2:54:14 PM] |
||
|
gj82854
Advanced Cruncher Joined: Sep 26, 2022 Post Count: 96 Status: Offline Project Badges: ![]() ![]() |
I had two WUs that were also listed as being too late but were returned well within the deadline. I went back to check some more details about 10 minutes later and both were not listed as being too late anymore. Since I didn't make note of the WU names the first time I was not able to determine their ultimate status but as of now, I don't have any listed as being too late out of about 156 returned. I'm thinking, without further evidence, that it is a temporary status.
|
||
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2145 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The workunit that I reported earlier (being 'Too Late' to validate) finally ended up in 'Error', while two of the devices seemed to return a valid result within the projected deadline(s):
workunit 626514226 App: Africa Rainfall ProjectDetails: --------------------------------------------------------------------------------------------------------------------------------------- ARP1_0014490_126_0 Fedora Linux Error 2024-11-04T07:23:47 2024-11-05T03:10:13 14.39/15.69 632.1/0.0 Adri |
||
|
rilian
Veteran Cruncher Ukraine - we rule! Joined: Jun 17, 2007 Post Count: 1453 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Same here, I have one task that crunched for 15 hours and returned in less than 2 days "Too Late"
----------------------------------------![]() ![]() ![]() ARP1_0001790_139_1 Linux Ubuntu status Too Late sent 2024-11-04 07:46:37 UTC returned 2024-11-05 23:07:24 UTC |
||
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12310 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The problem here is that 5 copies were returned with errors for whatever reason before the valid one was returned, but by that time the unit had been automatically Errored Out.
Mike |
||
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 1944 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Just noticed one ARP1 WU that one of my hosts returned during the day is marked as "too late", this despite the deadline was supposed to be on 11/12, with 5 other WUs send out (_0-_3, _5) all resulting in "error", and another copy, _6 just showing "other" with no further information (btw, these are all Windows 10 hosts)
----------------------------------------
![]() [Edit 1 times, last edit by TPCBF at Nov 9, 2024 4:20:45 AM] |
||
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12310 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Version 6 probably was not sent as 5 errors had occurred.
|
||
|
|
![]() |