World Community Grid - View Thread - [Error] ATOM syntax incorrect: "62 " is not a valid atom number

World Community Grid Forums

Category: Completed Research

Forum: Smash Childhood Cancer

Thread: [Error] ATOM syntax incorrect: "62 " is not a valid atom number

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 101

[ ]

Author

This topic has been viewed 20671 times and has 100 replies

alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1337
Status: Offline
Project Badges:

1 year badge for Human Proteome Folding - Phase 2

14 day badge for Discovering Dengue Drugs - Together

14 day badge for Nutritious Rice for the World

180 day badge for Help Fight Childhood Cancer

90 day badge for Help Cure Muscular Dystrophy - Phase 2

1 year badge for The Clean Energy Project - Phase 2

180 day badge for Computing for Clean Water

1 year badge for Drug Search for Leishmaniasis

180 day badge for GO Fight Against Malaria

14 day badge for Computing for Sustainable Water

50 year badge for Mapping Cancer Markers

2 year badge for Uncovering Genome Mysteries

5 year badge for Outsmart Ebola Together

10 year badge for FightAIDS@Home - Phase 2

10 year badge for Microbiome Immunity Project

10 year badge for Africa Rainfall Project

10 year badge for OpenPandemics - COVID-19


Re: [Error] ATOM syntax incorrect: "62 " is not a valid atom number

Adri.

[Drifting towards the margins of "off topic" here, perhaps, but...]

I'll answer a couple of your questions by saying that the script keeps a catalogue of workunit names it has seen, assessed and either aborted or passed for execution. At the start of each pass through client_state it marks them as "not seen this time", and if it sees them again it marks them as "seen" but doesn't do anything else! At the end of the pass, any names not seen on that occasion get removed from the catalogue.[*1]

If the script gets shut down, it dumps the current state of that catalogue, which it will re-read the next time it starts up; again, that should stop repeated efforts to abort in the unlikely event that it has taken a long time to report the aborted task!

The script sleeps for 5 minutes between passes, so there's a fair chance that aborted units might've vanished already; as for the "urgent" tasks, they're unlikely to get priority over existing tasks on my systems as I only allow very small (<10) numbers of tasks for SCC1 (and MCM1, as it happens) at a time...

As for hacking on the logging module(s) to get UTC time, I probably could if I had the time to spare, but...

Cheers - Al.

P.S. [Definitely off topic :-)] I haven't even looked at puzzle creation again yet -- too much else going on at the moment :-)

[*1] All the techniques used for this script had already been employed for daemons I use to collect information on receptors and ligands for OPN1/G and SCC1, control parameters for MCM1 and task completion information for all WCG projects. The cataloguing technique described above is essential for the pre-run data collection scripts, as there's often a lot of files to check out and it should only be done once per task! The code of the daemons may not be optimal but it has a proven track record :-)

[Jun 9, 2023 10:52:43 PM]

Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1414
Status: Offline
Project Badges:

2 year badge for Human Proteome Folding - Phase 2

90 day badge for Discovering Dengue Drugs - Together

1 year badge for Nutritious Rice for the World

90 day badge for The Clean Energy Project

2 year badge for Help Fight Childhood Cancer

90 day badge for Influenza Antiviral Drug Search

2 year badge for Help Cure Muscular Dystrophy - Phase 2

2 year badge for Discovering Dengue Drugs - Together - Phase 2

2 year badge for The Clean Energy Project - Phase 2

2 year badge for Computing for Clean Water

2 year badge for Drug Search for Leishmaniasis

2 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

20 year badge for Mapping Cancer Markers

20 year badge for Outsmart Ebola Together

20 year badge for FightAIDS@Home - Phase 2

20 year badge for Smash Childhood Cancer

5 year badge for Microbiome Immunity Project

50 year badge for OpenPandemics - COVID-19


Re: [Error] ATOM syntax incorrect: "62 " is not a valid atom number

Occasionally I get a task that has a deadline of 3 days, so it gets a high priority to run and this will always lead to that task in Running state - unless I have enough (MCM1/SCC1) tasks with a 3 day deadline in the queue, which is probably never. sad

When you get tasks with an earlier deadline, you have a bigger chance that tasks will run FIFO,
when your buffer is set to 0 (zero) and your additional buffer to the max workbuffer you want e.g. 2.
Reporting work is done at least 1 hour after a job has finished and will report it and request new work when your buffer is below the additional.

[Jun 10, 2023 10:24:07 AM]

adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2360
Status: Offline
Project Badges:

5 year badge for Human Proteome Folding - Phase 2

90 day badge for Nutritious Rice for the World

14 day badge for Discovering Dengue Drugs - Together - Phase 2

180 day badge for The Clean Energy Project - Phase 2

1 year badge for Computing for Clean Water

1 year badge for GO Fight Against Malaria

45 day badge for Computing for Sustainable Water

100 year badge for Mapping Cancer Markers

1 year badge for Uncovering Genome Mysteries

2 year badge for FightAIDS@Home - Phase 2


Re: [Error] ATOM syntax incorrect: "62 " is not a valid atom number

In an ultimate attempt to try to stop executing any tasks from faulty batches that error out straight away, resulting in an unreliable client, in the file app_config.xml I've tried setting <max_concurrent> for scc1 to -1. That worked! Now, any tasks from SCC1 aren't executing anymore, so that any tasks from faulty batches can get aborted (User Aborted) before they immediately start upon receipt.
Reason: as soon as you're reliable, you will have a better chance of receiving tasks from SCC1. blushing

In the meantime, has anyone noticed that there aren't any new tasks from faulty batch 0004176 around anymore? The last ones I received were SCC1_0004176_MyoD1-C_56409_0 and SCC1_0004176_MyoD1-C_56530_0, received at 2023-06-09T14:00:39.

Something that I also noticed was that when you abort a faulty _0 task, two tasks are generated, one with a 6-day deadline and another with a 3-day deadline! See below.

 <1> * SCC1_0004165_MyoD1-C_4795_0  Fedora Linux  User Aborted  2023-06-10T09:35:42  2023-06-10T09:37:50
 <1>   SCC1_0004165_MyoD1-C_4795_1  Linuxmint     In Progress   2023-06-10T09:35:48  2023-06-16T09:35:48
 <1>   SCC1_0004165_MyoD1-C_4795_2  Linux Ubuntu  In Progress   2023-06-10T09:38:15  2023-06-13T09:38:15

Adri

[Jun 10, 2023 10:42:23 AM]

Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1414
Status: Offline
Project Badges:


Re: [Error] ATOM syntax incorrect: "62 " is not a valid atom number

Something that I also noticed was that when you abort a faulty _0 task, two tasks are generated, one with a 6-day deadline and another with a 3-day deadline!

Maybe this is the case when we are still early into a batch.
In batch 4176 I noticed that my aborted tasks did not get any resend, but in that batch we had already progressed into the second half of that batch.

----------------------------------------
[Edit 1 times, last edit by Crystal Pellet at Jun 10, 2023 12:45:34 PM]

[Jun 10, 2023 12:45:07 PM]

Spiderman
Advanced Cruncher
United States
Joined: Jul 13, 2020
Post Count: 143
Status: Offline
Project Badges:

1 year badge for OpenPandemics - COVID-19


Re: [Error] ATOM syntax incorrect: "62 " is not a valid atom number

I've not seen any additional SCC1_0004176 since about 24-hrs ago.

Unfortunately, (4) bad SCC1_0004174 's floated-in overnight and immediately error'd. One was across a brand new machine I just brought online -- hoping that box doesn't get put on the "bad list" that others previously noted.

[Jun 10, 2023 12:54:28 PM]

alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1337
Status: Offline
Project Badges:


Re: [Error] ATOM syntax incorrect: "62 " is not a valid atom number

Something that I also noticed was that when you abort a faulty _0 task, two tasks are generated, one with a 6-day deadline and another with a 3-day deadline! See below.

 <1> * SCC1_0004165_MyoD1-C_4795_0  Fedora Linux  User Aborted  2023-06-10T09:35:42  2023-06-10T09:37:50
 <1>   SCC1_0004165_MyoD1-C_4795_1  Linuxmint     In Progress   2023-06-10T09:35:48  2023-06-16T09:35:48
 <1>   SCC1_0004165_MyoD1-C_4795_2  Linux Ubuntu  In Progress   2023-06-10T09:38:15  2023-06-13T09:38:15

Adri

Not quite -- judging by the sent time on wingman 1 I'd say that it had decided to send two initial tasks out because you weren't eligible for adaptive replication... Only wingman 2 seems to be a response to your User Abort, and I can find lots of evidence for genuine retries getting 3 day deadlines even if the initial failure/abort is almost instant...

To verify the above statement, I sifted through my recent aborted SCC1 tasks. I actually struggled to find any within the last day or so where I was wingman 0 with Adaptive Replication -- I was getting a lot of retries so "first, solo" was quite rare :-)

I followed up on all of the ones I could easily find, and noted that one or two had the replication set to zero as was noted upstream in this thread (so no retries!) -- that tallies with what Crystal Pellet has just commented on for batch 4176 and explains Spiderman's observation...

Looking at the rest, I saw the same 3-day deadline pattern for all of them! If I have time (ha, ha!) I might try to look into all tasks, not just ones where I was wingman 0 and an AR candidate, but I suspect I'd find the same behaviour there too -- a random check on a handful of items tends to confirm that.

I'm getting to the stage where I wish they'd just turn SCC1 off until the scientists and WCG folks sort this out properly :-(

Cheers - Al.

P.S. Given your trick with max_concurrent, I have to note that my busiest system got hit by the relative lack of work around 07:00 to 10:00 UTC today and hit the "arrived and started too fast to catch" issue that we discussed earlier (first time it has run out of SCC1 in a while!) -- however, it only seemed to take it about 4 or 5 hours to get back to reliable status, so I can live with that for now :-)

[Edited to reference Spiderman's comment.]

----------------------------------------
[Edit 1 times, last edit by alanb1951 at Jun 10, 2023 8:30:49 PM]

[Jun 10, 2023 8:27:01 PM]

sptrog1
Master Cruncher
Joined: Dec 12, 2017
Post Count: 1593
Status: Offline
Project Badges:

90 day badge for Outsmart Ebola Together

180 day badge for FightAIDS@Home - Phase 2

180 day badge for Smash Childhood Cancer

2 year badge for Africa Rainfall Project

5 year badge for OpenPandemics - COVID-19


Re: [Error] ATOM syntax incorrect: "62 " is not a valid atom number

I just logged an error on a 4174 task with 5 entries (4 errors and 1 in progress, replication 2) in results. That in progress guy is going to be disappointed,

[Jun 10, 2023 10:35:33 PM]

adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2360
Status: Offline
Project Badges:


Re: [Error] ATOM syntax incorrect: "62 " is not a valid atom number

Something that I also noticed was that when you abort a faulty _0 task, two tasks are generated, one with a 6-day deadline and another with a 3-day deadline! See below.

 <1> * SCC1_0004165_MyoD1-C_4795_0  Fedora Linux  User Aborted  2023-06-10T09:35:42  2023-06-10T09:37:50
 <1>   SCC1_0004165_MyoD1-C_4795_1  Linuxmint     In Progress   2023-06-10T09:35:48  2023-06-16T09:35:48
 <1>   SCC1_0004165_MyoD1-C_4795_2  Linux Ubuntu  In Progress   2023-06-10T09:38:15  2023-06-13T09:38:15

Adri

Not quite -- judging by the sent time on wingman 1 I'd say that it had decided to send two initial tasks out

Yikes! I haven't been paying attention in Mr. Alanb1951's class today. worried

It was indeed a weird observation by me hypnotized

and this explains why I was wrong. d oh

Sorry!

[Jun 10, 2023 11:40:51 PM]

AgrFan
Senior Cruncher
USA
Joined: Apr 17, 2008
Post Count: 397
Status: Offline
Project Badges:

1 year badge for Discovering Dengue Drugs - Together - Phase 2

10 year badge for The Clean Energy Project - Phase 2

5 year badge for Drug Search for Leishmaniasis

5 year badge for GO Fight Against Malaria

10 year badge for Uncovering Genome Mysteries

10 year badge for Outsmart Ebola Together

10 year badge for Smash Childhood Cancer

20 year badge for Microbiome Immunity Project


Re: [Error] ATOM syntax incorrect: "62 " is not a valid atom number

@TigerLily,

Can we get an update on the defective SCC batches?

Any ETA for a fix?

Thanks,
AgrFan

----------------------------------------

i5-10400 (Comet Lake, 6C/12T) @ 2.9 GHz
i5-7400 (Kaby Lake, 4C/4T) @ 3.0 GHz
i5-4590 (Haswell, 4C/4T) @ 3.3 GHz
i5-3330 (Ivy Bridge, 4C/4T) @ 3.0 GHz

[Jun 11, 2023 8:09:03 PM]

NixChix
Veteran Cruncher
United States
Joined: Apr 29, 2007
Post Count: 1187
Status: Offline
Project Badges:

180 day badge for Discovering Dengue Drugs - Together

2 year badge for Nutritious Rice for the World

180 day badge for Discovering Dengue Drugs - Together - Phase 2

1 year badge for Computing for Sustainable Water

5 year badge for Uncovering Genome Mysteries

5 year badge for Africa Rainfall Project


Re: [Error] ATOM syntax incorrect: "62 " is not a valid atom number

@TigerLily,

Can we get an update on the defective SCC batches?

Any ETA for a fix?

Thanks,
AgrFan

+1 - an acknowledgement of the problem would be great too.

Cheers coffee

----------------------------------------

----------------------------------------
[Edit 1 times, last edit by NixChix at Jun 11, 2023 8:27:32 PM]

[Jun 11, 2023 8:26:29 PM]

[ ]