| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 101
|
|
| Author |
|
|
Cyclops
Senior Cruncher Joined: Jun 13, 2022 Post Count: 295 Status: Offline |
All tasks seem to fail on different computers. Devs? Hi tullio, can you give us screenshots or additional information of your tasks so we can look into the issue? Thanks. |
||
|
|
phillipspencer
Advanced Cruncher France Joined: Apr 9, 2015 Post Count: 71 Status: Offline Project Badges:
|
@ Cyclops
----------------------------------------For me, all the SCC errors I am seeing are in the same batch of: SCC1_0004175_MyoD1-C My system is Windows 11 but I see wingmen on multiple different versions of Windows also having errors. From commenst above, it looks like the whole 0004175 batch is dud. [Edit 1 times, last edit by phillipspencer at May 19, 2023 3:07:30 PM] |
||
|
|
Cyclops
Senior Cruncher Joined: Jun 13, 2022 Post Count: 295 Status: Offline |
@ Cyclops For me, all the SCC errors I am seeing are in the same batch of: SCC1_0004175_MyoD1-C My system is Windows 11 but I see wingmen on multiple different versions of Windows also having errors. From commenst above, it looks like the whole 0004175 batch is dud. Hi phillipspencer, thanks for sharing this with me, I will send it to the team to investigate. |
||
|
|
tullio
Cruncher Joined: May 31, 2020 Post Count: 3 Status: Offline |
My computer is Rozzano1922. It is running in Science United.
Tullio |
||
|
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2346 Status: Offline Project Badges:
|
Cyclops, a warm thank you
to the WCG team intervening in the release of faulty batch SCC1_0004175_MyoD1-C:workunit 298080058 SCC1_0004175_MyoD1-C_24293_0 MSWin 10 Server Aborted 2023-05-19T12:25:57 2023-05-19T18:51:09 workunit 298080056 SCC1_0004175_MyoD1-C_24297_0 MSWin 11 Server Aborted 2023-05-19T12:19:02 2023-05-19T18:24:06 workunit 298080053 SCC1_0004175_MyoD1-C_24300_0 MSWin 10 Server Aborted 2023-05-19T12:18:56 2023-05-19T21:03:39 workunit 298133969 SCC1_0004175_MyoD1-C_37044_0 Other workunit 298133972 SCC1_0004175_MyoD1-C_37055_0 Other workunit 298134002 SCC1_0004175_MyoD1-C_37037_0 Other Adri |
||
|
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7846 Status: Offline Project Badges:
|
I have a rash of SCC units which have been "Server Aborted". All of them are work units which have been issued 3 times within less than 10 minutes of each other. All of them are minimum quorum 2 and Replication 2. There must be a glitch in the feeder system because it would be normal to only issue 2 work units and only issue the third unit if one of the first 2 had an error. I would surmise this is creating a bit of extra usage of bandwidth and overhead which is unnecessary.
----------------------------------------Result name OS type OS version Status Sent time SCC1_0004083_MyoD1-A_17550_0 Linux Valid 2023-05-20 22:15:01 SCC1_0004083_MyoD1-A_17550_1 Linux Server Aborted 2023-05-20 22:15:11 SCC1_0004083_MyoD1-A_17550_2 Linux Valid 2023-05-20 22:21:54
Sgt. Joe
*Minnesota Crunchers* |
||
|
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2346 Status: Offline Project Badges:
|
My device has received one task that was Server Aborted - else I probably wouldn't have noticed it - as part of a triplet as well.
----------------------------------------I may have found a glitch in there: (output generated by 'wcgstats -Hf= 298497159 -SJr') workunit 298497159 App: Smash Childhood Cancer--------------------------------------------------------------------------------------------------------------------------------------- Details: SCC1_0004138_MyoD1-B_86586_0 CentOS Linux Valid 2023-05-23T09:54:57 2023-05-23T10:45:51 0.84/0.84 As one can see, the Due time for task _0 was extremely short: 1 minute and 27 seconds. Sgt.Joe, could you check the Due time for the three tasks in (one of) your 'problem' cases? These cases are very rare in my opinion and there is probably, maybe, a relation with the error in the API that occasionally pops up as well, where one of the dates is temporarily misrepresented: CpuTime Elapsed Claimed Granted ModTime Exit Outc SentTime ReceivedTime Name You see, today it is 2023-05-23 and the server reported that they received my result at 2023-05-26T21:28:07, that's (still) three days in the future! Amazing, since final validation was at Sat 20 May 21:12:27 UTC 2023 (see below, "ModTime": 1684617147). [Try this: date -d@1684617147 -u] Then, when I check my results from the API at a later moment in time (taken from local file "wcgresults.2023-05-21T00:59:01.167780.0"), I see everything is in order: {Adri PS I have reported about this anomaly in the IBM days as well, more than once, in case anyone wonders. Example: post 541622 about "API Returning Conflicting Data" in 2017. [Edit 1 times, last edit by adriverhoef at May 23, 2023 1:57:42 PM] |
||
|
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 1317 Status: Recently Active Project Badges:
|
Adri, Sgt. Joe;
----------------------------------------I've had just two of these (out of over 200 SCC1 jobs in the 24 hours to 01:00 UTC on 23rd); a few details below on one of them...
The other WU was similar but the _0 task was given a little bit longer :-) By the way, I've also noticed other cases of these nonsense deadlines (some more tight than others!) across various WCG apps, as well as the future return time issue! I even had to put some defensive coding into one of my scripts that used the older API to stop it trying to process records with due time or return time less than sent time (though it doesn't trigger very often!) I had wondered whether the bizarre deadlines were something that happens when a user resets a system which had some tasks that didn't yet have a wingman (so on the next fetch the task might get re-sent with an inappropriate deadline...) However, evidence for that was inconclusive given the data we can see as [mere] users :-) Curiouser and curiouser... Cheers - Al. [Edit - I added something about that thread Adri mentioned, but on re-consideration I deleted it again as not relevant!] [Edit 3 times, last edit by alanb1951 at May 23, 2023 3:25:48 PM] |
||
|
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2346 Status: Offline Project Badges:
|
Al,
I'm always happy to see confirmation of my observations by other proof, like yours. Nice to view your output as well. As I don't think that these posts belong here, I'd suggest to post sightings of these cases in a better fitting (new?) thread. I've come across some new cases as well. If you're looking for tasknames ending in _2 - and this is especially true for SCC1 - they can more or less easily be spotted, if you have (had) them in your queue, of course. Obviously, using a script that makes use of the API makes life a lot easier in this case. Adri |
||
|
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7846 Status: Offline Project Badges:
|
Sgt.Joe, could you check the Due time for the three tasks in (one of) your 'problem' cases? the next time i run across a batch of these I will look at the "Due Time" also. Thanks Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
|
|