Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 8
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 1252 times and has 7 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
hb0xx WUs suffering lots of startup errors

Just a heads up for the grid techs: The recent hb0xx series of WUs are having a lot of problems with errors where the job aborts immediately upon startup. One of mine died with this error:

<core_client_version>5.4.9</core_client_version>
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
length mismatch in frag file
total_residue: 0 length 2

</stderr_txt>

but it's not just me. I'm seeing WUs in the log with 4 or 5 "Error" statuses returned, and 0 time claimed for each (e.g. hb052_01, hb062_03). None of these show any Valid returns, just Error or In Progress, so the In Progress ones probably haven't tried to start yet.

But not all hb0xx units are borked; I'm just about to finish hb053_06_1 with no apparent problems.

For what it's worth.
[May 31, 2006 8:30:48 AM]   Link   Report threatening or abusive post: please login first  Go to top 
olympic
Senior Cruncher
Joined: Jun 12, 2005
Post Count: 156
Status: Offline
Reply to this Post  Reply with Quote 
Re: hb0xx WUs suffering lots of startup errors

Same here, 2 errors so far and the WU's were re-issued several more times with more errors and zero CPU time. I have 3 more waiting in line, we'll see how those go.
----------------------------------------

[May 31, 2006 8:57:53 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: hb0xx WUs suffering lots of startup errors

I wonder... if a work unit returns 3 or more errors, BOINC has a quorum result. I would guess that it marks it complete and kicks it out for analysis. The 4 or 5 results you see would be due to the normal BOINC retry policy - when it has 1 or 2 error results, it would send extra copies out in the hope of reaching a valid quorum.

Thanks for reporting it, but I really expect BOINC to have reported it automatically.
[May 31, 2006 9:19:39 AM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: hb0xx WUs suffering lots of startup errors

Thanks for letting us know - there are significantly more errors with the hb batch then any others we have sent out previously. We will take a look and let people know what is going on.

We have BOINC configured to mark a workunit as an error once 4 results have been returned as an error. At that point no new work is issued for the workunit and we take a look.

Kevin
----------------------------------------
[Edit 1 times, last edit by knreed at May 31, 2006 2:00:10 PM]
[May 31, 2006 1:59:21 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Goku
Advanced Cruncher
France - Caen (Calvados / Normandie)
Joined: Nov 30, 2004
Post Count: 84
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: hb0xx WUs suffering lots of startup errors

Same problem for hb089_06

hb089_06 Other 05/31/2006 11:57:52 05/31/2006 17:23:48 4.29 23 / 0
hb089_06 Error 05/31/2006 07:49:29 05/31/2006 14:19:12 3.94 26 / 0
hb089_06 Error 05/31/2006 07:38:50 05/31/2006 11:53:29 1.97 13 / 0
hb089_06 Other 05/31/2006 07:37:22 05/31/2006 12:52:46 2.11 25 / 0

and for ha008_04 ?

ha008_04 Error 05/31/2006 17:42:27 05/31/2006 18:11:35 0.00 0 / 0
ha008_04 Error 05/31/2006 17:23:49 05/31/2006 17:33:59 0.00 0 / 0
ha008_04 Error 05/31/2006 17:08:12 05/31/2006 17:18:27 0.00 0 / 0
ha008_04 In Progress 05/31/2006 17:07:02 06/07/2006 17:07:02 0.00 0 / 0
ha008_04 In Progress 05/31/2006 17:00:14 06/07/2006 17:00:14 0.00 0 / 0
ha008_04 Other 01/01/1970 00:00:00 01/01/1970 00:00:00 0.00 0 / 0
----------------------------------------
----------------------------------------
[Edit 1 times, last edit by MaitreYoda at May 31, 2006 7:26:18 PM]
[May 31, 2006 7:24:05 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: hb0xx WUs suffering lots of startup errors

Guys, knreed wrote
We have seen a large number of errors with the hbXXX series of workunits for Human Proteome Folding on BOINC. We are looking into the problem now and will let you know what we find.


That means anything starting with hb.... you can stop those and try others or try Faah to help find a cure against HIV
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[May 31, 2006 8:55:25 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: hb0xx WUs suffering lots of startup errors

The latest on this is here: http://www.worldcommunitygrid.org/forums/wcg/viewthread?thread=7442

No more HPF units are being sent out at the moment - switch to FA@H until its sorted.
[Jun 1, 2006 10:42:23 AM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: hb0xx WUs suffering lots of startup errors

We have re-enabled the Human Proteome Folding project and are running workunits from batch 'ex'.
[Jun 1, 2006 2:15:06 PM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread