World Community Grid - View Thread - heaps of errors all of a sudden

World Community Grid Forums

Category: Active Research

Forum: OpenPandemics - COVID-19 Project

Thread: heaps of errors all of a sudden

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 48

[ ]

Author

This topic has been viewed 12589 times and has 47 replies

erich56
Senior Cruncher
Austria
Joined: Feb 24, 2007
Post Count: 300
Status: Offline
Project Badges:

1 year badge for Human Proteome Folding - Phase 2

14 day badge for Help Cure Muscular Dystrophy

14 day badge for Discovering Dengue Drugs - Together

45 day badge for Nutritious Rice for the World

180 day badge for The Clean Energy Project - Phase 2

10 year badge for Mapping Cancer Markers

180 day badge for Uncovering Genome Mysteries

2 year badge for Outsmart Ebola Together

1 year badge for FightAIDS@Home - Phase 2

180 day badge for Microbiome Immunity Project

14 day badge for Africa Rainfall Project

5 year badge for OpenPandemics - COVID-19


Re: heaps of errors all of a sudden

You can also add:

OPNG_00866582

I wonder whether the project team could not remove all the unsent tasks from these faulty batches.

----------------------------------------
[Edit 1 times, last edit by erich56 at Aug 18, 2021 7:17:02 PM]

[Aug 18, 2021 6:53:02 PM]

Greger
Cruncher
Joined: Aug 1, 2013
Post Count: 29
Status: Offline
Project Badges:

1 year badge for The Clean Energy Project - Phase 2

14 day badge for Uncovering Genome Mysteries

5 year badge for Outsmart Ebola Together

5 year badge for FightAIDS@Home - Phase 2

20 year badge for Smash Childhood Cancer

2 year badge for Microbiome Immunity Project

2 year badge for Africa Rainfall Project

50 year badge for OpenPandemics - COVID-19


Re: heaps of errors all of a sudden

Got 135 failed task so far today with issue. They would not start so it would not hurt.
Hope upcoming batches would work better.

[Aug 18, 2021 7:21:09 PM]

erich56
Senior Cruncher
Austria
Joined: Feb 24, 2007
Post Count: 300
Status: Offline
Project Badges:


Re: heaps of errors all of a sudden

They would not start so it would not hurt.

yes, in a way you are right. On the other hand, as long as tasks come in that rarely, it's a pitty if from the the few tasks which are downloaded some then do not work.

[Aug 18, 2021 7:29:05 PM]

erich56
Senior Cruncher
Austria
Joined: Feb 24, 2007
Post Count: 300
Status: Offline
Project Badges:


Re: heaps of errors all of a sudden

I've looked through my account (multiple machines), and found examples of this particular type of error from batches:

OPNG_0086575
OPNG_0086577
OPNG_0086579
OPNG_0086584
OPNG_0086585
OPNG_0086589
OPNG_0086592
OPNG_0086594
OPNG_0086595
OPNG_0086596
OPNG_0086603

- all with the "The number of atom types found ... does not match" error, and all since 01:40 UTC today.

add: OPNG_0086587

[Aug 18, 2021 8:09:25 PM]

erich56
Senior Cruncher
Austria
Joined: Feb 24, 2007
Post Count: 300
Status: Offline
Project Badges:


Re: heaps of errors all of a sudden

Got 135 failed task so far today with issue. They would not start so it would not hurt.

not always though. I just watched a task which when it reached the 100% in the BOINC progress bar, it showed "computation error" :-(

[Aug 18, 2021 8:24:05 PM]

Dark Angel
Veteran Cruncher
Australia
Joined: Nov 11, 2005
Post Count: 728
Status: Offline
Project Badges:

180 day badge for Human Proteome Folding

2 year badge for Human Proteome Folding - Phase 2

90 day badge for Help Cure Muscular Dystrophy

1 year badge for Discovering Dengue Drugs - Together

1 year badge for Nutritious Rice for the World

90 day badge for The Clean Energy Project

2 year badge for Help Fight Childhood Cancer

90 day badge for Influenza Antiviral Drug Search

2 year badge for Help Cure Muscular Dystrophy - Phase 2

1 year badge for Discovering Dengue Drugs - Together - Phase 2

2 year badge for The Clean Energy Project - Phase 2

5 year badge for Computing for Clean Water

2 year badge for Drug Search for Leishmaniasis

2 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

20 year badge for Mapping Cancer Markers

2 year badge for Uncovering Genome Mysteries

20 year badge for OpenPandemics - COVID-19


Re: heaps of errors all of a sudden

I've only had seven errors but they're all the same across both rigs: autogrid4: ERROR: The number of atom types found in the receptor PDBQT (8) does not match the number specified by the "receptor_types" command (7) in the GPF!

----------------------------------------

Currently being moderated under false pretences

[Aug 18, 2021 9:42:16 PM]

Grumpy Swede
Master Cruncher
Svíþjóð
Joined: Apr 10, 2020
Post Count: 2550
Status: Offline
Project Badges:

14 day badge for FightAIDS@Home - Phase 2

1 year badge for Microbiome Immunity Project

90 day badge for Africa Rainfall Project

2 year badge for OpenPandemics - COVID-19


Re: heaps of errors all of a sudden

So, why doesn't the admins/techs stop these WU's from being sent out? Not a word from any of them.....

[Aug 19, 2021 12:17:33 AM]

Acibant
Advanced Cruncher
USA
Joined: Apr 15, 2020
Post Count: 126
Status: Offline
Project Badges:

50 year badge for Mapping Cancer Markers

5 year badge for Microbiome Immunity Project

10 year badge for Africa Rainfall Project


Re: heaps of errors all of a sudden

So, why doesn't the admins/techs stop these WU's from being sent out? Not a word from any of them.....

Clearly they were trying something different, given the difference in work unit sizes, but with those errors on so many I wish they tried them first through the beta testing route. Now there's a lot of wasted computation as work units are sent to two different machines for validation for all those who lost their reliable status due to getting an error through no fault of their own.

----------------------------------------

[Aug 19, 2021 1:42:53 AM]

erich56
Senior Cruncher
Austria
Joined: Feb 24, 2007
Post Count: 300
Status: Offline
Project Badges:


Re: heaps of errors all of a sudden

add: OPNG_0086581

----------------------------------------
[Edit 1 times, last edit by erich56 at Aug 19, 2021 2:54:47 AM]

[Aug 19, 2021 2:54:25 AM]

alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1337
Status: Offline
Project Badges:

14 day badge for Nutritious Rice for the World

180 day badge for Help Fight Childhood Cancer

90 day badge for Help Cure Muscular Dystrophy - Phase 2

180 day badge for Computing for Clean Water

1 year badge for Drug Search for Leishmaniasis

180 day badge for GO Fight Against Malaria

14 day badge for Computing for Sustainable Water

10 year badge for FightAIDS@Home - Phase 2

10 year badge for Microbiome Immunity Project

10 year badge for OpenPandemics - COVID-19


Re: heaps of errors all of a sudden

So, why doesn't the admins/techs stop these WU's from being sent out? Not a word from any of them.....

For what it's worth, it appears that all the failing batches are for a specific receptor, named 7aga_001_mgltools--LYS102. There are several other receptors of similar size in surrounding batches which don't seem to be throwing errors, so hopefully it's not going to be an ongoing problem as long as someone works out what's wrong with that batch. Should they Beta-test every single new receptor to avoid things like this? After all, what's going to constitute a big enough change to mean issues might be expected...

It wouldn't be the first time we've had problems here when something was amiss in either data or parameters -- remember the misplaced grids that caused quite a lot of Invalid tasks in the second half of April 2021?

As for stopping the WUs being sent out, I suspect that by the time sensible action could've been taken the entire batch was probably already out in the field - it wouldn't take long to queue retries on tasks that fail in seconds, and I suspect a lot of them were being sent back as errors within a couple of hours of receipt (all mine were!)

I tend to agree about the potential loss of reliable status, but it is what it is...

Cheers - Al

P.S. Looking at the main data file for the problem batch, it is the only time I've ever seen HETATM data in an OPN1/OPNG receptor .pdbqt file. Probably a coincidence, but interesting...

[Aug 19, 2021 4:37:25 AM]

[ ]