Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 101
Posts: 101   Pages: 11   [ Previous Page | 1 2 3 4 5 6 7 8 9 10 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 20336 times and has 100 replies Next Thread
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2346
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: [Error] ATOM syntax incorrect: "62 " is not a valid atom number

Soon I will be running out of SCC1-tasks, apart from faulty batch 0004176. Faulty? Not entirely. You can fix it yourself! I did that with two tasks and one of them went Valid (see post 686839), just by putting an extra space (the single space that was missing) between "ATOM" and "62" in a file.

So what I will do now is to repair the remaining erroneous tasks from batch 0004176 in my queue, in the hopes that someone else will do as I did, so that the two partnered tasks (wingmen) will match and both go Valid.

If you are also running out of SCC1-tasks and are left with defective ones from batch 0004176, just give it a try. There is a tiny little, slight chance that you will find a wingman such as I. All you need to do is this as superuser:

# cd ~boinc/projects/www.worldcommunitygrid.org
# a=$(grep -l ^ATOM" 62 " [0-9a-f]*.pdbqt 2>/dev/null)
# [ -n "$a" ] && printf "HOT FIX for:\n%s\n" "$a" && sed -i 's/^ATOM 62 /ATOM 62 /' $a && ls -l $a

Adri
[Jun 2, 2023 4:57:22 PM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1317
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: [Error] ATOM syntax incorrect: "62 " is not a valid atom number

Adri,

I wondered about doing that the first time round, but opted against it because I couldn't be sure there wasn't also something else wrong with the file that didn't show as a syntax error... So I just hope that is the only error in that data file :-)

One side-effect of users fixing the data file might be to disguise the problem, and if [as appears to be the case] they have suspended they suspend SCC1 supply with a view to identifying and removing the remains of the bad batch (as appeared to happen last time), any "repaired" jobs still out in the field probably won't count for anything (depending on how they "remove" the bad WUs...)

[Edit:] I thought they had suspended SCC1 to do some clean-up as there didn't seem to be any new SCC1 of any type for quite a long time... However, new SCC1 tasks started turning up late this afternoon, so perhaps it was just an overnight precaution (their time, not UTC...)

I'm more concerned about how a second bad batch got turned into active WUs after they'd had to deal with the first one -- if it had already been delivered by the scientists, could it not have been checked[1] (and either repaired before WU generation or suppressed. as appropriate!); if it was a new delivery, why hadn't the scientists checked the flex file and repaired it before shipping?

And if there are still more bad batches already in the pipeline, I hope they get culled or cured in advance :-)

Cheers - Al.

[1] I don't know how automated the process of accepting SCC1 work and making WUs is, so that might not be as easy as it sounds :-(

[Edited in light of the [apparent] resumption of SCC1 supply, including bad batch cases...]
----------------------------------------
[Edit 1 times, last edit by alanb1951 at Jun 2, 2023 9:09:24 PM]
[Jun 2, 2023 7:14:36 PM]   Link   Report threatening or abusive post: please login first  Go to top 
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 2173
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: [Error] ATOM syntax incorrect: "62 " is not a valid atom number

Well, half a day later, as we are definitely heading into the weekend, WCG is still pushing out the new fault SCC1 batch. Just like the last time.

And from WCG Towers, still crickets. sad

Makes me wonder if their strategy is to just run through that batch until they all have errored out at the users, instead of cancelling the batch on the server side before they are wasting anyone's bandwidth... crying

Ralf
[Jun 2, 2023 8:51:52 PM]   Link   Report threatening or abusive post: please login first  Go to top 
hchc
Veteran Cruncher
USA
Joined: Aug 15, 2006
Post Count: 865
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: [Error] ATOM syntax incorrect: "62 " is not a valid atom number

Clever workaround adriverhoef, but yeah, it won't address the root cause of what caused it in the first place, and the odds are very low that the whole batch won't be invalidated and re-issued.
----------------------------------------
  • i5-7500 (Kaby Lake, 4C/4T) @ 3.4 GHz
  • i5-4590 (Haswell, 4C/4T) @ 3.3 GHz
  • i5-3570 (Broadwell, 4C/4T) @ 3.4 GHz

[Jun 2, 2023 9:17:59 PM]   Link   Report threatening or abusive post: please login first  Go to top 
hchc
Veteran Cruncher
USA
Joined: Aug 15, 2006
Post Count: 865
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: [Error] ATOM syntax incorrect: "62 " is not a valid atom number

I can't abort these 4176 tasks fast enough. Keep getting sent new ones. Are WCG techs asleep at the wheel*?

* That's a joke. I'll be here all night.
----------------------------------------
  • i5-7500 (Kaby Lake, 4C/4T) @ 3.4 GHz
  • i5-4590 (Haswell, 4C/4T) @ 3.3 GHz
  • i5-3570 (Broadwell, 4C/4T) @ 3.4 GHz

[Jun 2, 2023 9:32:37 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Speedy51
Veteran Cruncher
New Zealand
Joined: Nov 4, 2005
Post Count: 1326
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: [Error] ATOM syntax incorrect: "62 " is not a valid atom number

I can't abort these 4176 tasks fast enough. Keep getting sent new ones. Are WCG techs asleep at the wheel*?

* That's a joke. I'll be here all night.

To save you being there all night and you thought about using Boinc Tasks this will allow you to cancel all tasks ready to start. I do recommend setting "no new tasks" before cancelling tasks waiting to start :-)
----------------------------------------

[Jun 2, 2023 11:28:40 PM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2346
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: [Error] ATOM syntax incorrect: "62 " is not a valid atom number

At the moment - as I see it - if you abort your faulty tasks (from batch 0004176), your wingmen's tasks will be Server Aborted:
 <8> * SCC1_0004176_MyoD1-C_6035_0  Fedora Linux  User Aborted    2023-06-02T03:22:07  2023-06-03T00:05:59
<8> SCC1_0004176_MyoD1-C_6035_1 Linux Ubuntu Server Aborted 2023-06-02T03:22:16 2023-06-03T00:10:06

So there isn't much use anymore of fixing and getting these faulty tasks to work, since as soon as a repaired (and finished) task is returned, the server will Server Abort all wingmen's tasks (if they're not running yet), so that the mended task will be marked Too Late sooner or later:
<15> * SCC1_0004176_MyoD1-C_1087_0  Fedora Linux  Too Late        2023-06-01T21:40:10  2023-06-02T20:50:37
<15> SCC1_0004176_MyoD1-C_1087_1 LinuxMint Server Aborted 2023-06-01T21:40:22 2023-06-02T22:44:59

Adri
[Jun 3, 2023 12:51:58 AM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1317
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: [Error] ATOM syntax incorrect: "62 " is not a valid atom number

Adri,

Thanks for posting about those, as I'd been aborting any I spotted but hadn't followed up to see what happened to them!

It looks as if something was done about these bad WUs some time between about 17:00 and 19:00 UTC on 2nd June (WCG afternoon shift?) as any tasks of mine that failed (or that I aborted if I spotted them) before that period ended up with retries (up to about the same time interval), whereas after then any tasks that were sent back (or aborted) didn't get retries!

As for those two examples, I think "Too Late" may also appear for returned tasks that are for "Don't need" cases, and as retries don't seem to be going out for MyoD1-C tasks any longer and tasks already out there are being Server Aborted it looks as if they may have [finally] marked the bad work units as unwanted!

An unwelcome current side-effect of whatever they've done is that the only available SCC1 work now seems to be retries for MyoD1-A/B work-units :-( -- It now being the weekend (or almost so in WCG's time-zone!), it'll be interesting to see if any new work shows up before Monday and in the two+ hours since I first posted this. the tap has been turned on again, and there are still occasional MyoD1-C tasks amongst them (but not many...)

I hope they post something about what is happening regarding the ongoing problems with MyoD1-C batches[1]...

Cheers - Al.

[1] And if that includes the information that the only thing wrong with the flex file was that missing space, it might legitimize your work-around :-) -- not that tampering with data files should ever be acceptable, even in what seems to be a good cause... :-) :-)
----------------------------------------
[Edit 1 times, last edit by alanb1951 at Jun 3, 2023 5:58:26 AM]
[Jun 3, 2023 3:22:47 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1403
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: [Error] ATOM syntax incorrect: "62 " is not a valid atom number

I give the HOT FIX a try on a Win10 machine. It's a quorum 1 workunit and still running.

https://www.worldcommunitygrid.org/contribution/workunit/312168628

EDIT: all in vain - Too Late / Quorum 1, Replication 2 sad
----------------------------------------
[Edit 1 times, last edit by Crystal Pellet at Jun 3, 2023 10:06:20 AM]
[Jun 3, 2023 8:03:56 AM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2346
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: [Error] ATOM syntax incorrect: "62 " is not a valid atom number

Al,
An unwelcome current side-effect of whatever they've done is that the only available SCC1 work now seems to be retries for MyoD1-A/B work-units :-(

That may be the result of (types A and B) tasks needing wingmen to resolve the "unreliable" status that you get after processing any, each and every task from the faulty batch. All faulty tasks together are creating a hausse (upturn) in tasks (of types A and B) needing verification. Also, type C is still being sent out at a slow pace, because of the resends for types A and B needing verification. Important: the system is holding up and still hasn't collapsed.

Also, new workunits for types A and B are being distributed, albeit still scarcely.

[if] the only thing wrong with the flex file was that missing space, it might legitimize your work-around :-) -- not that tampering with data files should ever be acceptable, even in what seems to be a good cause... :-) :-)
Agreed. It seemed like a good idea at first, but in the end it only led to a lot of wasted cycles (and one Valid(*1)). It should probably never be acceptable in any way but to point out and document the error.

Adri
[*1] (Output generated by 'wcgstats -frrre* SCC1_0004176_MyoD1-C_0299')
workunit 311931323
SCC1_0004176_MyoD1-C_0299_0  Fedora Linux  Valid  2023-06-01T21:21:18  2023-06-02T09:46:32  0.77/0.78  69.0/69.0
Logfile:
<core_client_version>7.20.2</core_client_version>
<stderr_txt>
INFO: result number = 0
INFO: No state to restore. Start from the beginning.
[10:58:41] Number of tasks = 1
[10:58:41] Running task 0,CPU time at start of task 0 was 0.000000
[10:58:41] ./cmpd-1100299.pdbqt size = 19 3 ../../projects/www.worldcommunitygrid.org/scc1.MyoD1-C.pdbqt size = 1268 0
[11:45:29] Finished task #0 cpu time used 2784.904472
11:45:29 (1000920): called boinc_finish(0)

</stderr_txt>

PS Crystal Pellet, nice try!
----------------------------------------
[Edit 2 times, last edit by adriverhoef at Jun 3, 2023 11:16:18 AM]
[Jun 3, 2023 10:29:08 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 101   Pages: 11   [ Previous Page | 1 2 3 4 5 6 7 8 9 10 | Next Page ]
[ Jump to Last Post ]
Post new Thread