World Community Grid - View Thread - [Error] ATOM syntax incorrect: "62 " is not a valid atom number

World Community Grid Forums

Category: Completed Research

Forum: Smash Childhood Cancer

Thread: [Error] ATOM syntax incorrect: "62 " is not a valid atom number

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 101

[ ]

Author

This topic has been viewed 20344 times and has 100 replies

Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7846
Status: Offline
Project Badges:

2 year badge for Human Proteome Folding - Phase 2

14 day badge for Help Cure Muscular Dystrophy

2 year badge for Discovering Dengue Drugs - Together

2 year badge for Nutritious Rice for the World

14 day badge for The Clean Energy Project

10 year badge for Help Fight Childhood Cancer

90 day badge for Influenza Antiviral Drug Search

2 year badge for Help Cure Muscular Dystrophy - Phase 2

45 day badge for Discovering Dengue Drugs - Together - Phase 2

2 year badge for The Clean Energy Project - Phase 2

2 year badge for Computing for Clean Water

5 year badge for Drug Search for Leishmaniasis

5 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

200 year badge for Mapping Cancer Markers

5 year badge for Uncovering Genome Mysteries

20 year badge for Outsmart Ebola Together

10 year badge for FightAIDS@Home - Phase 2

100 year badge for Smash Childhood Cancer

10 year badge for Microbiome Immunity Project

2 year badge for Africa Rainfall Project

100 year badge for OpenPandemics - COVID-19


Re: [Error] ATOM syntax incorrect: "62 " is not a valid atom number

Why the techs and scientists have not purged the 4176 batch from the system is a mystery. You would think, by now, that someone would have noticed that this batch is defective, probably in its entirety. Any little blurb of news acknowledging the problem would certainly be appreciated.
Cheers

----------------------------------------

Sgt. Joe
*Minnesota Crunchers*

[Jun 3, 2023 1:46:24 PM]

AgrFan
Senior Cruncher
USA
Joined: Apr 17, 2008
Post Count: 396
Status: Offline
Project Badges:

5 year badge for Human Proteome Folding - Phase 2

90 day badge for Discovering Dengue Drugs - Together

90 day badge for Nutritious Rice for the World

90 day badge for The Clean Energy Project

2 year badge for Help Fight Childhood Cancer

1 year badge for Discovering Dengue Drugs - Together - Phase 2

10 year badge for The Clean Energy Project - Phase 2

20 year badge for Mapping Cancer Markers

10 year badge for Uncovering Genome Mysteries

10 year badge for Outsmart Ebola Together

10 year badge for Smash Childhood Cancer

20 year badge for Microbiome Immunity Project


Re: [Error] ATOM syntax incorrect: "62 " is not a valid atom number

They're busy manually adding new devices to My Contribution pages.

----------------------------------------

i5-10400 (Comet Lake, 6C/12T) @ 2.9 GHz
i5-7400 (Kaby Lake, 4C/4T) @ 3.0 GHz
i5-4590 (Haswell, 4C/4T) @ 3.3 GHz
i5-3330 (Ivy Bridge, 4C/4T) @ 3.0 GHz

----------------------------------------
[Edit 1 times, last edit by AgrFan at Jun 3, 2023 2:03:35 PM]

[Jun 3, 2023 1:58:05 PM]

TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 2173
Status: Offline
Project Badges:

5 year badge for The Clean Energy Project - Phase 2

2 year badge for Drug Search for Leishmaniasis

2 year badge for GO Fight Against Malaria

50 year badge for Outsmart Ebola Together

20 year badge for FightAIDS@Home - Phase 2

50 year badge for Smash Childhood Cancer

50 year badge for Microbiome Immunity Project

5 year badge for Africa Rainfall Project


Re: [Error] ATOM syntax incorrect: "62 " is not a valid atom number

They're busy manually adding new devices to My Contribution pages.

That sounds about right... sad

Ralf

[Jun 3, 2023 2:30:36 PM]

adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2346
Status: Offline
Project Badges:

14 day badge for Discovering Dengue Drugs - Together - Phase 2

180 day badge for The Clean Energy Project - Phase 2

1 year badge for Computing for Clean Water

1 year badge for Drug Search for Leishmaniasis

1 year badge for GO Fight Against Malaria

45 day badge for Computing for Sustainable Water

100 year badge for Mapping Cancer Markers

1 year badge for Uncovering Genome Mysteries

2 year badge for FightAIDS@Home - Phase 2

20 year badge for Smash Childhood Cancer

5 year badge for Microbiome Immunity Project

10 year badge for Africa Rainfall Project

50 year badge for OpenPandemics - COVID-19


Re: [Error] ATOM syntax incorrect: "62 " is not a valid atom number

May I correct you, Sgt.Joe? They do have noticed that batch 0004176 is defective. It's just not that easy to decide how to tackle the problem, else they probably would have used an easy method. I've seen several 'methods' passing by. The latest method seems to be to just let the task get executed (which will take less than a second) and after its return to the server have it marked Error and then refrain from releasing a _1 task.(*1)

[*1] Since 18:30 UTC last Friday all my 100 returned tasks (BTW, all _0s) from the faulty batch were marked as Error (apart from 13 _0s that were User Aborted) and never got a resend (_1).

Of course this is all based on empirical data: getting results, making an observation, developing an idea, testing the idea, and making a conclusion.

Adri

[Jun 3, 2023 2:53:27 PM]

alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1317
Status: Offline
Project Badges:

1 year badge for Human Proteome Folding - Phase 2

14 day badge for Discovering Dengue Drugs - Together

14 day badge for Nutritious Rice for the World

180 day badge for Help Fight Childhood Cancer

90 day badge for Help Cure Muscular Dystrophy - Phase 2

1 year badge for The Clean Energy Project - Phase 2

180 day badge for Computing for Clean Water

180 day badge for GO Fight Against Malaria

14 day badge for Computing for Sustainable Water

50 year badge for Mapping Cancer Markers

2 year badge for Uncovering Genome Mysteries

5 year badge for Outsmart Ebola Together

10 year badge for OpenPandemics - COVID-19


Re: [Error] ATOM syntax incorrect: "62 " is not a valid atom number

As Adri and I have both observed, there's evidence that they've been working on it, but it would appear to be a non-trivial task (see below). As that's the case, some information would've been welcome but I suspect it's a case of fighting the fire rather than talking about it (too few staff to do otherwise, I fear...) -- that said, I find it a bit disappointing that we don't even get "Yes, we know there's a problem, and we're working on it"[1]...

For information (with apologies to anyone who already knows this...):

A standard BOINC set-up has an Ops web page which harvests parameters to cancel jobs. In the source I've looked at, there are three ways of telling the system what to cancel:

ID range
ID list
SQL where clause

Jobs and results for cancelled tasks are marked as 'no longer needed' and returned results are not given credit (not that that's a big deal if all returns are errors or aborts!)

If the IDs of the unwanted work units are sequential and uninterrupted, the first option seems like an easy method! However, there is some evidence that this is not always the case -- I don't get enough SCC1 work on any given day to be likely to get a run of consecutive work unit IDs but what I do see suggests that there may be lots of [relatively] short interleaved sequences for the individual targets[2]. And, of course, there might be some non-SCC1 work within the overall sequence as well...

So it may take a lot of ID ranges to do things that way (and, of course, they'd have to find out what said ranges were in the first place!)

The ID list method would probably only be useful to kill off a handful of WUs, and we obviously aren't talking such small numbers of problem tasks here!

So the most elegant solution would be to craft an SQL where-clause that picks up work units for the correct application and work-unit name structure (to pick the right "batch" and "target") and ignores WUs that have a canonical result... I think the form offers a list of targeted WUs before submitting the cancellation request, and I suspect that might restrict the number of items that can be done on each pass!

The above relates to "standard BOINC" -- who knows what changes might have been made by IBM for WCG :-)

By the way, I find it interesting that the "within batch" numbers at the end of the work unit names are scattered around, rather than increasing with rising work unit ID... I suspect Adri may have noted this when looking at his data sets, and it sticks out like a sore thumb in my database online displays :-)

Cheers - Al.

[1] Perhaps the response of some users to that sort of message has put them off? It would certainly irritate me if I still had a job that included a support role!...

[2] I use "small" as relative to the total amount of WUs being created -- I've seen evidence of sequences of well under 1000...

[Jun 3, 2023 5:42:50 PM]

Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7846
Status: Offline
Project Badges:


Re: [Error] ATOM syntax incorrect: "62 " is not a valid atom number

May I correct you, Sgt.Joe?

Please do. I am always willing to learn more. I do note that Al says" I find it a bit disappointing that we don't even get "Yes, we know there's a problem, and we're working on it" It would take less than 30 seconds to type this and put it in the forum or a news release. Apparently Al is correct that it is not a trivial matter to purge a particular batch because I just got another one. True, they end almost immediately with an error, but that affects the reliability of the machine and causes some queue anomalies until the reliability status is restored.
Cheers

----------------------------------------

Sgt. Joe
*Minnesota Crunchers*

[Jun 3, 2023 6:05:18 PM]

adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2346
Status: Offline
Project Badges:


Re: [Error] ATOM syntax incorrect: "62 " is not a valid atom number

Al:

By the way, I find it interesting that the "within batch" numbers at the end of the work unit names are scattered around, rather than increasing with rising work unit ID... I suspect Adri may have noted this when looking at his data sets, and it sticks out like a sore thumb in my database online displays :-)

Indeed, it has always been this way as far as I can remember and it's the reason why the thread Weekend Puzzles was created by SekeRob, or so it seems. Well, almost. SekeRob was showing a list of ResultIDs and their individual WorkunitIDs. Of course, if you look at it that way, a WorkunitID can appear multiple times as it can 'contain' multiple ResultIDs.

Let's have a look at two tasks that I've just received from faulty batch 0004176:
workunit 312357528

SCC1_0004176_MyoD1-C_22087_0  Fedora Linux  U.Aborted  2023-06-03T23:59:53  2023-06-04T00:02:00

workunit 312357529

SCC1_0004176_MyoD1-C_22096_0  Fedora Linux  U.Aborted  2023-06-03T23:59:53  2023-06-04T00:02:00

Both tasks are part of two separate workunits, each with only one result. Although the sequences are 9 numbers apart (the sequence of the first one is 22087, the other one's sequence is 22096), their WorkunitIDs are neighbours, 312357528 and 312357529.
Makes you wonder what their neighbours are, isn't it? wink

Here is the answer:
(Output generated by 'wcgstats -frSS= 312357527')
workunit 312357527

SCC1_0004099_MyoD1-A_1044_0  Darwin        In Progr.  2023-06-03T23:59:52  2023-06-09T23:59:52

(Output generated by 'wcgstats -frSS= 312357530')
workunit 312357530

SCC1_0004099_MyoD1-A_1043_0  Fedora Linux  In Progr.  2023-06-03T23:59:53  2023-06-09T23:59:53

Oh! Look at the coloured taskname. It means that that coloured task was received on one of my own devices, said the WU hog. devilish

(I was sheltering 25 SCC1-tasks on that device at that moment, 2023-06-03T23:59:53.)
Anyway. It's interesting to see that (in batch 0004099 with type MyoD1-A) sequence 1043 from workunit 312357530 and sequence 1044 from workunit 312357527 are 3 workunits apart, while the sequences 1043 and 1044 are adjacent.
Makes you curious what's up with workunits 312357526 and 312357525, adjacent to 312357527 above. This is what I see:

(Output generated by 'wcgstats -frSS= 312357526')
workunit 312357526

SCC1_0004176_MyoD1-C_22097_0  Linuxmint     Error      2023-06-03T23:59:51  2023-06-04T00:01:58

(Output generated by 'wcgstats -frSS= 312357525')
workunit 312357525

SCC1_0004159_MyoD1-B_19222_0  MSWin 10      In Progr.  2023-06-03T23:59:50  2023-06-09T23:59:50

So, in any case, it must be clear at this point that there is a scatter of types A, B and C if you 'follow' the WorkunitIDs incrementally. It's a mix of sequences within a batch, too, as Al already noted.

Adri

----------------------------------------
[Edit 1 times, last edit by adriverhoef at Jun 4, 2023 9:58:09 AM]

[Jun 4, 2023 1:05:07 AM]

adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2346
Status: Offline
Project Badges:


Re: [Error] ATOM syntax incorrect: "62 " is not a valid atom number

I am always willing to learn more.
(...) I just got another one. True, they end almost immediately with an error, but that affects the reliability of the machine and causes some queue anomalies until the reliability status is restored.

If you have enough SCC1-tasks in your queue, the faulty tasks don't start immediately when you receive them, so you can write a script that aborts them automatically within a few minutes (hence 'sleep 120' below), so they don't start running and affect your machine's reliability.
This will help:

(cd ~boinc/projects/www.worldcommunitygrid.org/ &&
   while sleep 120; do
      a=$(wcgresults -HAT | grep SCC1_0004176_MyoD1-C_.*_.$) &&
      grep -l ^ATOM"    62 " [0-9a-f]*.pdbqt 2>/dev/null &&
      for task in $a; do boinccmd --task http://www.worldcommunitygrid.org/ $task abort; done
   done)

If you don't have 'wcgresults' installed, you'd have to use this piece of code(*1) instead of the former a= assignment above:

a=$(boinccmd --get_tasks | sed -n /SCC1_0004176_MyoD1-C_/s/WU.name://p | sed s/$/_0/)

Adri
EDIT: [*1] NB: this last piece of code only works for suffix _0; with a little tweak you can make it work for any suffix devilish

----------------------------------------
[Edit 4 times, last edit by adriverhoef at Jun 4, 2023 7:56:19 AM]

[Jun 4, 2023 1:41:56 AM]

adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2346
Status: Offline
Project Badges:


Re: [Error] ATOM syntax incorrect: "62 " is not a valid atom number

Al:

I find it a bit disappointing that we don't even get "Yes, we know there's a problem, and we're working on it"

Let's suppose that the techs would have to post something like that. There are things that you can explain and things that you can't. Standing in the shoes of the techs, you could post in the forums saying that you're working on it, but as we all know, this will provoke reactions like "When will it be solved?" (which can't have an exact answer(*1)), or worse: "Why haven't you (so and so)?" and "Why couldn't you have (this and that)?". Then they would have to react to that; it will never end, because answering questions like these aren't productive. What's more, "why"-questions can't be answered logically when it comes to humane behaviour. You'll never know all that's playing: a team meeting, working hours (it never ends), how bad is the problem, aren't there other fires to extinguish first, assigning people to do the job, sick people, etc.

[*1] And even if answered with an estimate, it can overrun its time or even get out of hand and then they would have to post another message. And another, which will elicit even more reactions. Like I said, it's not productive.

Maybe you say: that's what TigerLily is here for. Then you would have to address TigerLily first (maybe to ask to closely follow all forums or to ask politely if there is an answer?). Nobody did. devilish

(Or I must have missed it.)(*2) You should ask TigerLily, really.

[*2] Yes I know there is a need for (quick) answers from the WCG Team, but it just doesn't work that way (especially if there isn't a question laughing

). In general, they can't answer questions that aren't directed to them or questions that aren't fair or plain mean ("You should have done so and so, why didn't you?"). Apart from that, many users expect a reaction within a short period of time when they have a problem. It just doesn't work that way, especially when the problem has to be investigated upon first, how big the size of the problem is, what the impact of the problem is, how to tackle the problem, are there people available, who should do this, etc.

Adri
EDIT: All IMHO, of course. blushing

----------------------------------------
[Edit 1 times, last edit by adriverhoef at Jun 4, 2023 11:15:37 AM]

[Jun 4, 2023 11:04:51 AM]

Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7846
Status: Offline
Project Badges:


Re: [Error] ATOM syntax incorrect: "62 " is not a valid atom number

Adri:
All good points. However, a little transparency goes a long way. Not every volunteer will be mollified by platitudes, but at least they know the problem has been acknowledged. I don't feel a short update of a sentence or two on a daily basis is asking too much. OK, I will stop my bellyaching as the point is now made.
On another note, in my results I show 495 completed SCC units with 103 of them listed as "error." I checked a couple of them and they have creation dates of June 3, 2023. So, the faulty work units are still being created. At least they do not take any time so this will be solved eventually.
Cheers

----------------------------------------

Sgt. Joe
*Minnesota Crunchers*

[Jun 4, 2023 12:07:17 PM]

[ ]