Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 101
Posts: 101   Pages: 11   [ Previous Page | 2 3 4 5 6 7 8 9 10 11 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 20351 times and has 100 replies Next Thread
Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 1296
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: [Error] ATOM syntax incorrect: "62 " is not a valid atom number

I got a few errors from "C" tasks, and am no longer getting SCC. I'm asking, but none are being sent.

Am I not getting SCC because of my errors, or are other people not seeing SCC without having errors. Is it me, or the system?
[Jun 9, 2023 4:12:23 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12594
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: [Error] ATOM syntax incorrect: "62 " is not a valid atom number

Loads of 4174's and a 4165 here errored multiple times.

Mike
[Jun 9, 2023 4:25:50 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12594
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: [Error] ATOM syntax incorrect: "62 " is not a valid atom number

I notice that when I get a batch of ATOM 62 errors, they error in quick succession so a number of them get uploaded together. However, my cache is only replenished 1 at a time and spasmodically at that. In between I get the dread tasks committed to other platforms message.

Mike
[Jun 9, 2023 4:56:27 PM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2346
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: [Error] ATOM syntax incorrect: "62 " is not a valid atom number

Unixchick, the number of new SCC1-tasks being distributed is dropping fast, as the server (a) needs to find reliable clients (because of the increasing amount of tasks that need to be verified and that is because of the increasing amount of unreliable clients, caused by tasks from the faulty batch (still with Replication > 0) that error out immediately) and (b) to abort (Server Abort) tasks that will be 'Too Late' anyway (from the faulty batch that has Replication > 0).
So, SCC1-tasks are still being distributed, but the system has difficulty finding reliable clients. This is the same situation as reported in post 686894. The good news is that the server is holding up.
Still, in this situation I think that it is a good idea to abort (User Abort) the faulty tasks that you receive, for you will lose your reliability status if you execute a faulty task and as long as you have a reliable client your tasks don't need verification, giving the server more breathing room and more chance to send some tasks to you.

Adri
[Jun 9, 2023 5:06:31 PM]   Link   Report threatening or abusive post: please login first  Go to top 
NixChix
Veteran Cruncher
United States
Joined: Apr 29, 2007
Post Count: 1187
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: [Error] ATOM syntax incorrect: "62 " is not a valid atom number

I don't understand why this problem is not being addressed by WCG staff.

Cheers coffee
----------------------------------------

[Jun 9, 2023 5:24:20 PM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1317
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: [Error] ATOM syntax incorrect: "62 " is not a valid atom number

Adri,

From your post 687153 from a few hours back...
Nice to see that somebody else (see task _3 below) also (probably automatically(*1) (see post 686915)) aborts incoming tasks from the 'new' faulty batch 0004174

In this case that would've been me :-) I've written a Python script that scans client_state.xml for tasks that could be from invalid work-units, finds the specific flex file, checks it for the fault and invokes boinccmd to abort the task if appropriate. Here's a sample from its log on one of my machines (times are BST [UTC+1][*1]):

2023-06-08 19:17:11 - SCC1_0004176_MyoD1-C_50786_0:  aborted.
2023-06-09 04:32:21 - SCC1_0004165_MyoD1-C_0518_0: aborted.
2023-06-09 08:07:26 - SCC1_0004174_MyoD1-C_0213_1: aborted.
2023-06-09 09:52:28 - SCC1_0004174_MyoD1-C_0092_3: aborted.
2023-06-09 11:57:30 - SCC1_0004165_MyoD1-C_1998_0: aborted.
2023-06-09 12:02:30 - SCC1_0004165_MyoD1-C_2032_0: aborted.
2023-06-09 12:12:31 - SCC1_0004165_MyoD1-C_2086_1: aborted.
2023-06-09 13:27:33 - SCC1_0004176_MyoD1-C_56113_0: aborted.
2023-06-09 13:42:34 - SCC1_0004174_MyoD1-C_1241_2: aborted.
2023-06-09 14:07:34 - SCC1_0004174_MyoD1-C_1565_1: aborted.
2023-06-09 16:17:37 - SCC1_0004165_MyoD1-C_2405_3: aborted.
2023-06-09 17:42:38 - SCC1_0004174_MyoD1-C_2246_2: aborted.

If/when it sees a MyoD1-C task that doesn't have the bad flex file, the script will report "valid file!" and leave the task to run :-)

Your logic for aborting parallels mine, and the effect is obvious... Tthe machine from which that log snippet is taken typically returns about 100 valid SCC1 tasks a day; since I introduced the script I've not had any Errors (as expected) so I still manage to keep my [small] cache topped up despite still seeing "Tasks are committed to other platforms" fairly regularly (for reasons stated frequently in this and other fhreads...) My other systems that run SCC1 are also getting consistent supplies of work (but they don't handle as many SCC1 tasks a day')

Cheers - Al.

[*1] The script is based on the daemon scripts I've written for various other aspects of watching WCG work flow; they all use Python's logger module for the output and I've never bothered to work out how to get it to use UTC instead of local time (if it even can...)
[Jun 9, 2023 6:15:50 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 1296
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: [Error] ATOM syntax incorrect: "62 " is not a valid atom number

Thanks for the replies. I had a short queue and got a couple of error WUs in a row, that ran before I could abort them. I'm guessing that I'm now deemed unreliable for SCC. I've added MCM to my mix for the moment.

I too am surprised about the lack of attention to this problem.
[Jun 9, 2023 6:46:45 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12594
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: [Error] ATOM syntax incorrect: "62 " is not a valid atom number

And 4099 But they will be off for the weekend now!

Mike
[Jun 9, 2023 7:04:49 PM]   Link   Report threatening or abusive post: please login first  Go to top 
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 2173
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: [Error] ATOM syntax incorrect: "62 " is not a valid atom number

And 4099 But they will be off for the weekend now!

Mike
Well, yes, WCG Towers always does this right before the weekend. Nothing new here, beside that communication the last week has been even more abysmal than before...

But I can't confirm that SCC1 batch 4099 is bad per se, I just checked several hosts that have some of those and all of them are at least starting and running fine, though I didn't see any that had already finished.
So if there is a problem with that particular batch, the it is different from the subject of this thread, for which I have seen WUs of the batches 4165, 4174, 4175 and 4176, and which will error out right when they are beings started.

And I do not agree with Adri that they can't do anything about this, the question is rather if they KNOW how and where to cancel such jobs and more importantly, can be actual proactive and prevent the root cause of those faulty batches been created in the first place. But that's something that only WCG Towers could answer (if they are truthful and don't spread more platitudes), but right now, they once again ain't talking...

Ralf
[Jun 9, 2023 7:36:54 PM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2346
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: [Error] ATOM syntax incorrect: "62 " is not a valid atom number

Al, thanks for your response.

You wrote:
I've written a Python script that scans client_state.xml for tasks that could be from invalid work-units, finds the specific flex file, checks it for the fault and invokes boinccmd to abort the task if appropriate.

"And I wonder, still I wonder, who'll stop the ..."(*1)

Great! And I wonder, if people are getting inquisitive and interested in your script.

Still I wonder, how does that script handle the situation where a task is received that needs to be executed right away because its deadline is only 3 days instead of 6?
(Occasionally I get a task that has a deadline of 3 days, so it gets a high priority to run and this will always lead to that task in Running state - unless I have enough (MCM1/SCC1) tasks with a 3 day deadline in the queue, which is probably never. sad )

If/when it sees a MyoD1-C task that doesn't have the bad flex file, the script will report "valid file!" and leave the task to run :-)

So, the task stays in the queue, unharmed. Good. The conceivable situation hasn't happened yet, I guess, but - I'm thinking along with you - what will happen when that script sees the same task? Will it report "valid file!" again? wink

they all use Python's logger module for the output and I've never bothered to work out how to get it to use UTC instead of local time

So it isn't as simple as searching for 'python date utc' on internet and then finding this:
>>> from datetime import datetime, timezone
>>> print(datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%S"))
Nevertheless, I think you should keep local time and just be aware of it. Logging, a nice feature of Python.

[*1] faulty tasks/workunits/batches

Adri
PS I don't have a weekend puzzle ready at this time. biggrin
[Jun 9, 2023 10:12:33 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 101   Pages: 11   [ Previous Page | 2 3 4 5 6 7 8 9 10 11 | Next Page ]
[ Jump to Last Post ]
Post new Thread