Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 32
Posts: 32   Pages: 4   [ 1 2 3 4 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 3966 times and has 31 replies Next Thread
CandymanWCG
Senior Cruncher
Romania
Joined: Dec 20, 2010
Post Count: 421
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
shock HPF2 Error computing [Closed]

I've just witnessed the weirdest chain of events in the Advanced view of Boinc manager. I'm going to detail as much as possible to provide you with a complete picture of things:

I was running two tasks, one for CEP2 and one for HPF2, with no other tasks queued. I got one new HCC task and even though the CEP2 task said it still had something like 6 or 7 hours to go, suddenly it started uploading the result. At the same time I noticed a new task coming in (so a 4th one) but in the very next second I got a "Computing error" status message (I was watching this in the "Tasks" tab), so I went into the messages tab and this is what I got:

1/14/2011 9:49:59 PM World Community Grid Sending scheduler request: To fetch work.
1/14/2011 9:49:59 PM World Community Grid Requesting new tasks for CPU
1/14/2011 9:50:05 PM World Community Grid Scheduler request completed: got 1 new tasks
1/14/2011 9:50:07 PM World Community Grid Started download of ob330-339_ob337.fasta.gz
1/14/2011 9:50:07 PM World Community Grid Started download of ob330-339_ob337.psipred.gz
1/14/2011 9:50:08 PM World Community Grid Finished download of ob330-339_ob337.fasta.gz
1/14/2011 9:50:08 PM World Community Grid Started download of ob330-339_ob337.psipred_ss2.gz
1/14/2011 9:50:09 PM World Community Grid Finished download of ob330-339_ob337.psipred.gz
1/14/2011 9:50:09 PM World Community Grid Finished download of ob330-339_ob337.psipred_ss2.gz
1/14/2011 9:50:09 PM World Community Grid Started download of ob330-339_aaob33703_05.075_v1_3.gz
1/14/2011 9:50:09 PM World Community Grid Started download of ob330-339_aaob33709_05.075_v1_3.gz
1/14/2011 9:50:16 PM World Community Grid Finished download of ob330-339_aaob33703_05.075_v1_3.gz
1/14/2011 9:50:51 PM World Community Grid Finished download of ob330-339_aaob33709_05.075_v1_3.gz
1/14/2011 10:20:56 PM World Community Grid Computation for task E200980_376_A.27.C22H16N2S2Si.335.0.set1d06_0 finished
1/14/2011 10:20:57 PM World Community Grid Starting ob337_00021_1
1/14/2011 10:20:57 PM World Community Grid Starting task ob337_00021_1 using hpf2 version 617
1/14/2011 10:20:57 PM World Community Grid Sending scheduler request: To fetch work.
1/14/2011 10:20:57 PM World Community Grid Requesting new tasks for CPU
1/14/2011 10:20:58 PM World Community Grid Started upload of E200980_376_A.27.C22H16N2S2Si.335.0.set1d06_0_0
1/14/2011 10:20:58 PM World Community Grid Started upload of E200980_376_A.27.C22H16N2S2Si.335.0.set1d06_0_1
1/14/2011 10:20:59 PM World Community Grid Scheduler request completed: got 1 new tasks
1/14/2011 10:21:00 PM World Community Grid Finished upload of E200980_376_A.27.C22H16N2S2Si.335.0.set1d06_0_0
1/14/2011 10:21:00 PM World Community Grid Started upload of E200980_376_A.27.C22H16N2S2Si.335.0.set1d06_0_2
1/14/2011 10:21:01 PM World Community Grid Started download of X0000059530955200511080944_X0000059530955200511080944.jp2
1/14/2011 10:21:03 PM World Community Grid Finished upload of E200980_376_A.27.C22H16N2S2Si.335.0.set1d06_0_1
1/14/2011 10:21:03 PM World Community Grid Finished upload of E200980_376_A.27.C22H16N2S2Si.335.0.set1d06_0_2
1/14/2011 10:21:03 PM World Community Grid Started upload of E200980_376_A.27.C22H16N2S2Si.335.0.set1d06_0_3
1/14/2011 10:21:03 PM World Community Grid Started upload of E200980_376_A.27.C22H16N2S2Si.335.0.set1d06_0_4
1/14/2011 10:21:03 PM World Community Grid Finished download of X0000059530955200511080944_X0000059530955200511080944.jp2
1/14/2011 10:21:04 PM World Community Grid Finished upload of E200980_376_A.27.C22H16N2S2Si.335.0.set1d06_0_3
1/14/2011 10:21:58 PM World Community Grid Computation for task ob337_00021_1 finished
1/14/2011 10:21:58 PM World Community Grid Output file ob337_00021_1_0 for task ob337_00021_1 absent
1/14/2011 10:21:58 PM World Community Grid Starting X0000059530955200511080944_1
1/14/2011 10:21:58 PM World Community Grid Starting task X0000059530955200511080944_1 using hcc1 version 608
1/14/2011 10:23:05 PM World Community Grid Sending scheduler request: To fetch work.
1/14/2011 10:23:05 PM World Community Grid Reporting 1 completed tasks, requesting new tasks for CPU
1/14/2011 10:23:08 PM World Community Grid Scheduler request completed: got 1 new tasks
1/14/2011 10:23:23 PM World Community Grid Sending scheduler request: To fetch work.
1/14/2011 10:23:23 PM World Community Grid Requesting new tasks for CPU
1/14/2011 10:23:26 PM World Community Grid Scheduler request completed: got 1 new tasks
1/14/2011 10:23:28 PM World Community Grid Started download of X0000059530932200511080944_X0000059530932200511080944.jp2
1/14/2011 10:23:30 PM World Community Grid Finished download of X0000059530932200511080944_X0000059530932200511080944.jp2
1/14/2011 10:24:00 PM World Community Grid Finished upload of E200980_376_A.27.C22H16N2S2Si.335.0.set1d06_0_4

And here is the error message of the task:

Result Log

Result Name: ob337_ 00021_ 1--
<core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
ERROR:: Exit at: .\nblist.cc line:711

</stderr_txt>
]]>

I don't think it had anything to do with the CEP2 upload, which went on just fine and even got validated on the spot, but now I'm left wondering about two things:
1. how come the CEP2 project miscalculated by more than 6 hours the processing time? - not that I'm complaining, but it's weird, isn't it?
2. why did the HPF2 got an error?
ok, so maybe I have 3 questions:
3. what can I do to prevent this type of errors?

Thank you all in advance!

Edit: I would have posted this in the HPF2 Forum, but I thought it's more of a Boinc related topic. Feel free to move it, if you like.
----------------------------------------
Knowledge is limited. Imagination encircles the world! - Albert Einstein



----------------------------------------
[Edit 6 times, last edit by CandymanWCG at Jan 20, 2011 7:44:11 PM]
[Jan 14, 2011 8:50:31 PM]   Link   Report threatening or abusive post: please login first  Go to top 
gb009761
Master Cruncher
Scotland
Joined: Apr 6, 2005
Post Count: 3010
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: HPF2 Error computing

Firstly, CEP2 has a set 12 hr estimated time to complete - and, depending on it's progress, can be completed a lot earlier (i.e., don't go by the estimated time to complete for this particular project, the calculations simply can't compute it).

Secondly, HPF2 has had a very long standing issue of WU's completing (aborting) virtually straight away. The Techs know about this, and have attempted on numerous occasions to try and fix it - although it's one of those issues which is extremely hard to pin down (some times it happens, some times not...). Very frustrating, I know, but thankfully, no time is generally wasted on these WU's. If you get numerous WU's for HPF2 which go the similar way, I'd suggest de-selecting that project from your selection.
----------------------------------------

[Jan 14, 2011 9:12:36 PM]   Link   Report threatening or abusive post: please login first  Go to top 
CandymanWCG
Senior Cruncher
Romania
Joined: Dec 20, 2010
Post Count: 421
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: HPF2 Error computing

Thanks gb009761! Not the answer I was hoping for, but an answer non the less. I hope this doesn't happen very often, though. At least my mind's at ease since it's not something "wrong" with my machine or anything. coffee

Off-topic: just noticed your team. Are you an IBM UKI regular employee? smile

Cheers!
----------------------------------------
Knowledge is limited. Imagination encircles the world! - Albert Einstein



----------------------------------------
[Edit 1 times, last edit by CandymanWCG at Jan 14, 2011 9:18:09 PM]
[Jan 14, 2011 9:17:44 PM]   Link   Report threatening or abusive post: please login first  Go to top 
gb009761
Master Cruncher
Scotland
Joined: Apr 6, 2005
Post Count: 3010
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: HPF2 Error computing

With regards to the HPF2 error, here are a few threads (amongst many) that have discussed it in the past...

many errors onnly in THIS project
anyone else seeing these kinds of errors? I'm getting tons of them.
Updated HPF2 Science Applications

Happy reading...

As to being a regular IBM employee - I was, until last year when I got laid off crying
----------------------------------------

[Jan 14, 2011 9:20:27 PM]   Link   Report threatening or abusive post: please login first  Go to top 
CandymanWCG
Senior Cruncher
Romania
Joined: Dec 20, 2010
Post Count: 421
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: HPF2 Error computing

Thanks for the threads. I really didn't take the time to see if there was anything else out there about this topic. I guess I know better now... nerd

Sorry to hear about your job, but I hope you've got something even better now. sad Btw, I'm a contractor, so I thought maybe we'd hook up on Sametime or something, but too bad. I guess we'll just use the chat thread if we have something to share. biggrin

Happy crunching! Cheers, mate! peace
----------------------------------------
Knowledge is limited. Imagination encircles the world! - Albert Einstein



[Jan 14, 2011 9:34:41 PM]   Link   Report threatening or abusive post: please login first  Go to top 
sk..
Master Cruncher
http://s17.rimg.info/ccb5d62bd3e856cc0d1df9b0ee2f7f6a.gif
Joined: Mar 22, 2007
Post Count: 2324
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: HPF2 Error computing

All sorts of system issues can cause what I describe as runaway errors; continual task errors after a short time (<15sec). While these are rare, they regularly pop up in the forums - they stand out a mile.
It's good that you are running multiple projects.
The best thing to do is to shut the system down completely, and then start it up again, don't even do a restart. Then you can start to narrow the problem down, if it reappears. good luck
[Jan 14, 2011 10:08:39 PM]   Link   Report threatening or abusive post: please login first  Go to top 
CandymanWCG
Senior Cruncher
Romania
Joined: Dec 20, 2010
Post Count: 421
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: HPF2 Error computing

Hi skgiven, thanks for the input.

This was the first time I got the error (unfortunately I have a good feeling it's not going to be the last). Since then, I haven't rebooted, but I have 2 other HPF2 queued and ready to be crunched (and a couple more tasks too), so it's not by far a "continual" error. Not yet, anyway. But I will keep that in mind if it should happen.

Maybe it was the coincidence of having the CEP2 files uploading at the same time this task came in and started crunching...who knows? I'm just happy right now that I can keep on running tasks from all my selected projects.

Cheers!
----------------------------------------
Knowledge is limited. Imagination encircles the world! - Albert Einstein



[Jan 14, 2011 10:44:33 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: HPF2 Error computing

With this elusive HPF2 bug which used to be a /401 and after the last attempt to fix shifted to /711 line error (and occurring an order of magnitude less):

- Crashing only on Windows. Never on Mac or Linux (touch wood)
- Often failing only when in multicore combo, when HFCC or FAAH are already running (personal observation on Windows quad) - never tested with C4CW and CEP2 combo my side.
- 99 out 100 failing in first seconds... lossless to computing time, just eating download bandwidth.

Run them in a mix on specific devices that show this fail, or deselect the science completely for that device by linking a specific device profile to these device(s) with fail.

With running a science mix, the number of valid results from other sciences will ensure the continued supply of new tasks. Running HPF2 only and always failing eventually results in no work being send except 1 per day to test if the problem has disappeared.

cheers

edit: spell
----------------------------------------
[Edit 1 times, last edit by Former Member at Jan 15, 2011 8:35:33 AM]
[Jan 15, 2011 8:31:50 AM]   Link   Report threatening or abusive post: please login first  Go to top 
sk..
Master Cruncher
http://s17.rimg.info/ccb5d62bd3e856cc0d1df9b0ee2f7f6a.gif
Joined: Mar 22, 2007
Post Count: 2324
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: HPF2 Error computing

Just an observation - I found HPF2 tasks failed en mass on 2003 servers. On several occasions I had 20 or 30 back to back failures, no new tasks for 24h, and even regular failures in mixed project setup. I couldn't trust them enough to leave for a couple of days, so in the end I just excluded that project from those systems.
[Jan 15, 2011 1:03:35 PM]   Link   Report threatening or abusive post: please login first  Go to top 
CandymanWCG
Senior Cruncher
Romania
Joined: Dec 20, 2010
Post Count: 421
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: HPF2 Error computing

@SekeRob I've just returned my first result for HPF2 (got validated right away too) and I'm processing another one, plus one in the queue. I would say I'm safe for now and I'd like to consider that error an accident. I don't plan on spending any time testing. We'll just wait and see. Che sera, sera!

@skgiven Luckily, I'm not in that situation to have a chain of tasks fail, one after another. As I've told SekeRob too, I have returned a valid result for HPF2 and I have others downloaded and crunching. As for the 2003 server reference, I don't know if this is what you meant, but I'm running Win 7 Ultimate x64 on my machine.

Cheers to all!
----------------------------------------
Knowledge is limited. Imagination encircles the world! - Albert Einstein



[Jan 15, 2011 3:38:27 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 32   Pages: 4   [ 1 2 3 4 | Next Page ]
[ Jump to Last Post ]
Post new Thread