World Community Grid - View Thread

World Community Grid Forums

Category: Completed Research

Forum: FightAIDS@Home

Thread: Error

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 24

[ ]

Author

This topic has been viewed 4951 times and has 23 replies

foxfire
Advanced Cruncher
United States
Joined: Sep 1, 2007
Post Count: 121
Status: Offline
Project Badges:

10 year badge for Human Proteome Folding - Phase 2

2 year badge for Discovering Dengue Drugs - Together

10 year badge for Nutritious Rice for the World

2 year badge for The Clean Energy Project

20 year badge for Help Fight Childhood Cancer

2 year badge for Influenza Antiviral Drug Search

20 year badge for Help Cure Muscular Dystrophy - Phase 2

10 year badge for Discovering Dengue Drugs - Together - Phase 2

5 year badge for The Clean Energy Project - Phase 2

20 year badge for Computing for Clean Water

10 year badge for Drug Search for Leishmaniasis

20 year badge for GO Fight Against Malaria

10 year badge for Computing for Sustainable Water

50 year badge for Mapping Cancer Markers

50 year badge for Uncovering Genome Mysteries

100 year badge for Outsmart Ebola Together

5 year badge for FightAIDS@Home - Phase 2

10 year badge for Smash Childhood Cancer

2 year badge for Microbiome Immunity Project


Re: Error

Over the past 3 days I've gotten errors on:

OS________Nb
Linux______68
Win7 (64)__16
WinXP (32)__2

I think the only reason I'm seeing more on Linux is because I have more PCs running it than the other OS and Linux processes them faster.

----------------------------------------

[Mar 25, 2014 1:58:39 PM]

Dayle Diamond
Senior Cruncher
Joined: Jan 31, 2013
Post Count: 452
Status: Offline
Project Badges:

1 year badge for The Clean Energy Project - Phase 2

14 day badge for Drug Search for Leishmaniasis

100 year badge for Mapping Cancer Markers

5 year badge for Uncovering Genome Mysteries

20 year badge for Outsmart Ebola Together

20 year badge for FightAIDS@Home - Phase 2

10 year badge for Microbiome Immunity Project

2 year badge for Africa Rainfall Project

20 year badge for OpenPandemics - COVID-19


Re: Error

Correct me if I'm wrong, but FAAH/FAHV work units are run without a duplicate on computers where the last few day's results have been reliable.

So while these work units aren't directly wasting any time, there will be plenty of computers that are no longer deemed trustworthy, and that will halve crunching efficiency on any affected system.

[Mar 25, 2014 3:41:46 PM]

Tex1954
Cruncher
Joined: Nov 3, 2005
Post Count: 3
Status: Offline
Project Badges:

90 day badge for Human Proteome Folding - Phase 2

14 day badge for Nutritious Rice for the World

45 day badge for Help Fight Childhood Cancer

45 day badge for Help Cure Muscular Dystrophy - Phase 2

2 year badge for The Clean Energy Project - Phase 2

45 day badge for Computing for Clean Water

180 day badge for Drug Search for Leishmaniasis

90 day badge for GO Fight Against Malaria

14 day badge for Computing for Sustainable Water

1 year badge for Uncovering Genome Mysteries

5 year badge for Outsmart Ebola Together

5 year badge for Microbiome Immunity Project

1 year badge for Africa Rainfall Project

2 year badge for OpenPandemics - COVID-19


Re: Error

I experienced this issue only on linux. Is there any body getting this error on the Windows?

Read UP a little... my errors on Windows-7 64b machines.

:)

[Mar 25, 2014 6:17:13 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Error

Sorry for missleading...

I have hit a plenty of ERROR WUs on the Windows machines last night for both Vista and 7. I have realised by tracking the scheduling history. Actually, no
faah WUs have been downloaded on my Windows machines for a week but FAHVs while no FAHVs for Linux in this period.

It seems the team has stopped to send out the new faah WUs. I can see no faahs on the Windows machines and all faahs on the Linux machines are re-sent WUs which may end up with ERROR.

Kiyo.

[Mar 26, 2014 1:17:25 AM]

uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:

10 year badge for Human Proteome Folding

2 year badge for Human Proteome Folding - Phase 2

45 day badge for Help Cure Muscular Dystrophy

20 year badge for Nutritious Rice for the World

5 year badge for Help Fight Childhood Cancer

2 year badge for Help Cure Muscular Dystrophy - Phase 2

2 year badge for Discovering Dengue Drugs - Together - Phase 2

10 year badge for The Clean Energy Project - Phase 2

5 year badge for Computing for Clean Water

2 year badge for Computing for Sustainable Water

100 year badge for FightAIDS@Home - Phase 2

20 year badge for Smash Childhood Cancer

50 year badge for Microbiome Immunity Project

10 year badge for Africa Rainfall Project

50 year badge for OpenPandemics - COVID-19


Re: Error

Yes, there was an issue with the way work units were loaded. We have stopped sending out new work for those and I have recently cleared out the error work units so we can rebuild them. My plan at the moment is to resume the project during normal hours.

Note: This is only for FightAIDS@Home - Autodock version.

Thank you for your patience,
-Uplinger

[Mar 26, 2014 6:50:56 AM]

uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:


Re: Error

I have re-enabled FightAIDS@Home - Autodock, work should start being distributed shortly.

Thanks again for your patience and participation,
-Uplinger

[Mar 26, 2014 3:59:55 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Error

Now I'm getting faah WUs. Hope they run with no problem.

Thanks. Kiyo.

[Mar 26, 2014 11:18:54 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Error

Only ever rarely visit specific project forums, but this one required to go to page 3 before finding the thread looked for, 16 sticky posts had the top of the faah forum. How effective is that?

So this error / these errors is / are prolific, only just now looked at a set and forgot box to see it's reached daily quota of one and thus idle cores been munching on the SIMAP jobs as the backup project. Don't care so much what they compute on long as they compute when left on.

This is the error logged:

Result Name: FAHV_ x3NF6_ B_ IN_ Y3a_ rig_ 0206730_ 3091_ 1--
<core_client_version>7.2.34</core_client_version>
<![CDATA[
<message>
app_version download error: couldn't get input files:

These are some event log extracts

1332 World Community Grid 27-03-2014 09:40 Started download of fahv_image06_7.20.tga
1333 World Community Grid 27-03-2014 09:40 Finished download of fahv_image06_7.20.tga
1334 World Community Grid 27-03-2014 09:40 [error] Unable to verify fahv_image06_7.20.tga using certificates
1335 World Community Grid 27-03-2014 09:40 [error] Checksum or signature error for fahv_image06_7.20.tga

and

1408 World Community Grid 27-03-2014 15:21 Requesting new tasks for CPU
1409 World Community Grid 27-03-2014 15:21 [sched_op] CPU work request: 43200.00 seconds; 2.00 devices
1410 World Community Grid 27-03-2014 15:21 Scheduler request completed: got 0 new tasks
1411 World Community Grid 27-03-2014 15:21 [sched_op] Server version 701
1412 World Community Grid 27-03-2014 15:21 No tasks sent
1413 World Community Grid 27-03-2014 15:21 No tasks are available for FightAIDS@Home - Vina
1414 World Community Grid 27-03-2014 15:21 No tasks are available for FightAIDS@Home - AutoDock
1415 World Community Grid 27-03-2014 15:21 No tasks are available for the applications you have selected.
1416 World Community Grid 27-03-2014 15:21 This computer has finished a daily quota of 1 tasks
1417 World Community Grid 27-03-2014 15:21 Project requested delay of 121 seconds
1418 World Community Grid 27-03-2014 15:21 [sched_op] Deferring communication for 00:02:01
1419 World Community Grid 27-03-2014 15:21 [sched_op] Reason: requested by project

Hitting update 6 hours later earned the same log response, presuming some time after midnight the node gets another chance. As the project is now on a very long back-off counter, please do not rush to fix.

[Mar 27, 2014 8:29:30 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Error

The 24 hours had passed and nothing improved with the quota used set to 1 again, though 2 fahv were sent. They failed crc/md5 checks, probably a new version being pushed in relation to the new wcg logo:

World Community Grid	3/28/2014 8:35:06 AM	[sched_op] Starting scheduler request	
World Community Grid	3/28/2014 8:35:06 AM	Sending scheduler request: To fetch work.	
World Community Grid	3/28/2014 8:35:06 AM	Requesting new tasks for CPU	
World Community Grid	3/28/2014 8:35:06 AM	[sched_op] CPU work request: 16901.49 seconds; 2.00 devices	
World Community Grid	3/28/2014 8:35:10 AM	Scheduler request completed: got 2 new tasks	
World Community Grid	3/28/2014 8:35:10 AM	[sched_op] Server version 701	
World Community Grid	3/28/2014 8:35:10 AM	Project requested delay of 121 seconds	
World Community Grid	3/28/2014 8:35:10 AM	[sched_op] estimated total CPU task duration: 123417 seconds	
World Community Grid	3/28/2014 8:35:10 AM	[sched_op] Deferring communication for 00:02:01	
World Community Grid	3/28/2014 8:35:10 AM	[sched_op] Reason: requested by project	
World Community Grid	3/28/2014 8:35:12 AM	Started download of wcgrid_fahv_vina_7.20_windows_intelx86	
World Community Grid	3/28/2014 8:35:12 AM	Started download of wcgrid_fahv_vina_prod_32.exe.7.20	
World Community Grid	3/28/2014 8:35:15 AM	Finished download of wcgrid_fahv_vina_7.20_windows_intelx86	
World Community Grid	3/28/2014 8:35:15 AM	Finished download of wcgrid_fahv_vina_prod_32.exe.7.20	
World Community Grid	3/28/2014 8:35:15 AM	Started download of wcgrid_fahv_graphics_prod_32.exe.7.20	
World Community Grid	3/28/2014 8:35:15 AM	Started download of fahv_image01_7.20.tga	
World Community Grid	3/28/2014 8:35:15 AM	[error] Unable to verify wcgrid_fahv_vina_7.20_windows_intelx86 using certificates	
World Community Grid	3/28/2014 8:35:15 AM	[error] Checksum or signature error for wcgrid_fahv_vina_7.20_windows_intelx86	
World Community Grid	3/28/2014 8:35:15 AM	[error] Unable to verify wcgrid_fahv_vina_prod_32.exe.7.20 using certificates	
World Community Grid	3/28/2014 8:35:15 AM	[error] Checksum or signature error for wcgrid_fahv_vina_prod_32.exe.7.20	
World Community Grid	3/28/2014 8:35:17 AM	[sched_op] Deferring communication for 00:01:56	
World Community Grid	3/28/2014 8:35:17 AM	[sched_op] Reason: Unrecoverable error for task FAHV_x3NF6_B_IN_Y3b_rig_0206858_0226_2	
World Community Grid	3/28/2014 8:35:17 AM	[sched_op] Deferring communication for 00:02:56	
World Community Grid	3/28/2014 8:35:17 AM	[sched_op] Reason: Unrecoverable error for task FAHV_x3NF6_B_IN_Y3b_rig_0206814_0843_2

SIMAP continues to fill the gap. Will learn in another 24 hours if the technicians got to fixing this, and the over abundance of old news stickies.

cheerio

[Mar 28, 2014 10:36:14 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Error

Is anyone of the wcg team even reading? 24 hours later, exactly the same thing, not one but seven faah tasks assigned all going out with the same download failure error.

faah763212_ ZINC01646278_ x3NF8ledgfA0335_ 00_ 1-- 2718223 Error 3/29/14 07:43:00 3/29/14 07:48:52 0.00 / 0.00 46.6 / 0.0
faah763210_ ZINC00615883_ x3NF8ledgfA0333_ 00_ 1-- 2718223 Error 3/29/14 07:39:22 3/29/14 07:43:00 0.00 / 0.00 46.6 / 0.0
faah763210_ ZINC04522231_ x3NF8ledgfA0333_ 00_ 1-- 2718223 Error 3/29/14 07:39:22 3/29/14 07:43:00 0.00 / 0.00 46.6 / 0.0
faah763183_ ZINC01706126_ x3NF8ledgfA0306_ 00_ 1-- 2718223 Error 3/29/14 06:52:50 3/29/14 06:58:57 0.00 / 0.00 48.8 / 0.0
faah763181_ ZINC01676213_ x3NF8ledgfA0304_ 00_ 1-- 2718223 Error 3/29/14 06:48:36 3/29/14 06:50:44 0.00 / 0.00 48.8 / 0.0
faah763180_ ZINC01196937_ x3NF8ledgfA0303_ 00_ 0-- 2718223 Error 3/29/14 06:44:47 3/29/14 06:48:36 0.00 / 0.00 48.8 / 0.0
faah763180_ ZINC01621981_ x3NF8ledgfA0303_ 00_ 0-- 2718223 Error 3/29/14 06:44:47 3/29/14 06:48:36 0.00 / 0.00 48.8 / 0.0

Result Log

Result Name: faah763212_ ZINC01646278_ x3NF8ledgfA0335_ 00_ 1--
<core_client_version>7.2.47</core_client_version>
<![CDATA[
<message>
app_version download error: couldn't get input files:
<file_xfer_error>
<file_name>wcgrid_faah_7.16_windows_intelx86</file_name>
<error_code>-123 (no signature)</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>wcgrid_faah_autodock_prod_graphics.exe.7.16</file_name>
<error_code>-123 (no signature)</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>faah_image01_7.16.tga</file_name>
<error_code>-123 (no signature)</error_code>
</file_xfer_error>

</message>
]]>

Scanning the forums there appears to be a longer history with application content delivery. Regrettably, only faah are being send, no fahv, it thus not possible to tell if just this one or more. It is weekend, meaning this device will most probably be doing SIMAP for the next 72 hours at least, never an incident there.

For sake of giration, did project reset, did detach, did fetch the 7.2.47 client installed over 7.0.34, but no effect on the issue.

Oh, the linux node with 7.2.42 now has the same, but on there wcg is just a sideshow as four more active projects are attached to this agent.

cheerios

[Mar 29, 2014 9:45:26 AM]

[ ]