| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 32
|
|
| Author |
|
|
CandymanWCG
Senior Cruncher Romania Joined: Dec 20, 2010 Post Count: 421 Status: Offline Project Badges:
|
I've just witnessed the weirdest chain of events in the Advanced view of Boinc manager. I'm going to detail as much as possible to provide you with a complete picture of things:
----------------------------------------I was running two tasks, one for CEP2 and one for HPF2, with no other tasks queued. I got one new HCC task and even though the CEP2 task said it still had something like 6 or 7 hours to go, suddenly it started uploading the result. At the same time I noticed a new task coming in (so a 4th one) but in the very next second I got a "Computing error" status message (I was watching this in the "Tasks" tab), so I went into the messages tab and this is what I got: 1/14/2011 9:49:59 PM World Community Grid Sending scheduler request: To fetch work. 1/14/2011 9:49:59 PM World Community Grid Requesting new tasks for CPU 1/14/2011 9:50:05 PM World Community Grid Scheduler request completed: got 1 new tasks 1/14/2011 9:50:07 PM World Community Grid Started download of ob330-339_ob337.fasta.gz 1/14/2011 9:50:07 PM World Community Grid Started download of ob330-339_ob337.psipred.gz 1/14/2011 9:50:08 PM World Community Grid Finished download of ob330-339_ob337.fasta.gz 1/14/2011 9:50:08 PM World Community Grid Started download of ob330-339_ob337.psipred_ss2.gz 1/14/2011 9:50:09 PM World Community Grid Finished download of ob330-339_ob337.psipred.gz 1/14/2011 9:50:09 PM World Community Grid Finished download of ob330-339_ob337.psipred_ss2.gz 1/14/2011 9:50:09 PM World Community Grid Started download of ob330-339_aaob33703_05.075_v1_3.gz 1/14/2011 9:50:09 PM World Community Grid Started download of ob330-339_aaob33709_05.075_v1_3.gz 1/14/2011 9:50:16 PM World Community Grid Finished download of ob330-339_aaob33703_05.075_v1_3.gz 1/14/2011 9:50:51 PM World Community Grid Finished download of ob330-339_aaob33709_05.075_v1_3.gz 1/14/2011 10:20:56 PM World Community Grid Computation for task E200980_376_A.27.C22H16N2S2Si.335.0.set1d06_0 finished 1/14/2011 10:20:57 PM World Community Grid Starting ob337_00021_1 1/14/2011 10:20:57 PM World Community Grid Starting task ob337_00021_1 using hpf2 version 617 1/14/2011 10:20:57 PM World Community Grid Sending scheduler request: To fetch work. 1/14/2011 10:20:57 PM World Community Grid Requesting new tasks for CPU 1/14/2011 10:20:58 PM World Community Grid Started upload of E200980_376_A.27.C22H16N2S2Si.335.0.set1d06_0_0 1/14/2011 10:20:58 PM World Community Grid Started upload of E200980_376_A.27.C22H16N2S2Si.335.0.set1d06_0_1 1/14/2011 10:20:59 PM World Community Grid Scheduler request completed: got 1 new tasks 1/14/2011 10:21:00 PM World Community Grid Finished upload of E200980_376_A.27.C22H16N2S2Si.335.0.set1d06_0_0 1/14/2011 10:21:00 PM World Community Grid Started upload of E200980_376_A.27.C22H16N2S2Si.335.0.set1d06_0_2 1/14/2011 10:21:01 PM World Community Grid Started download of X0000059530955200511080944_X0000059530955200511080944.jp2 1/14/2011 10:21:03 PM World Community Grid Finished upload of E200980_376_A.27.C22H16N2S2Si.335.0.set1d06_0_1 1/14/2011 10:21:03 PM World Community Grid Finished upload of E200980_376_A.27.C22H16N2S2Si.335.0.set1d06_0_2 1/14/2011 10:21:03 PM World Community Grid Started upload of E200980_376_A.27.C22H16N2S2Si.335.0.set1d06_0_3 1/14/2011 10:21:03 PM World Community Grid Started upload of E200980_376_A.27.C22H16N2S2Si.335.0.set1d06_0_4 1/14/2011 10:21:03 PM World Community Grid Finished download of X0000059530955200511080944_X0000059530955200511080944.jp2 1/14/2011 10:21:04 PM World Community Grid Finished upload of E200980_376_A.27.C22H16N2S2Si.335.0.set1d06_0_3 1/14/2011 10:21:58 PM World Community Grid Computation for task ob337_00021_1 finished 1/14/2011 10:21:58 PM World Community Grid Output file ob337_00021_1_0 for task ob337_00021_1 absent 1/14/2011 10:21:58 PM World Community Grid Starting X0000059530955200511080944_1 1/14/2011 10:21:58 PM World Community Grid Starting task X0000059530955200511080944_1 using hcc1 version 608 1/14/2011 10:23:05 PM World Community Grid Sending scheduler request: To fetch work. 1/14/2011 10:23:05 PM World Community Grid Reporting 1 completed tasks, requesting new tasks for CPU 1/14/2011 10:23:08 PM World Community Grid Scheduler request completed: got 1 new tasks 1/14/2011 10:23:23 PM World Community Grid Sending scheduler request: To fetch work. 1/14/2011 10:23:23 PM World Community Grid Requesting new tasks for CPU 1/14/2011 10:23:26 PM World Community Grid Scheduler request completed: got 1 new tasks 1/14/2011 10:23:28 PM World Community Grid Started download of X0000059530932200511080944_X0000059530932200511080944.jp2 1/14/2011 10:23:30 PM World Community Grid Finished download of X0000059530932200511080944_X0000059530932200511080944.jp2 1/14/2011 10:24:00 PM World Community Grid Finished upload of E200980_376_A.27.C22H16N2S2Si.335.0.set1d06_0_4 And here is the error message of the task: Result Log Result Name: ob337_ 00021_ 1-- <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> ERROR:: Exit at: .\nblist.cc line:711 </stderr_txt> ]]> I don't think it had anything to do with the CEP2 upload, which went on just fine and even got validated on the spot, but now I'm left wondering about two things: 1. how come the CEP2 project miscalculated by more than 6 hours the processing time? - not that I'm complaining, but it's weird, isn't it? 2. why did the HPF2 got an error? ok, so maybe I have 3 questions: 3. what can I do to prevent this type of errors? Thank you all in advance! Edit: I would have posted this in the HPF2 Forum, but I thought it's more of a Boinc related topic. Feel free to move it, if you like. Knowledge is limited. Imagination encircles the world! - Albert Einstein ![]() [Edit 6 times, last edit by CandymanWCG at Jan 20, 2011 7:44:11 PM] |
||
|
|
gb009761
Master Cruncher Scotland Joined: Apr 6, 2005 Post Count: 3010 Status: Offline Project Badges:
|
Firstly, CEP2 has a set 12 hr estimated time to complete - and, depending on it's progress, can be completed a lot earlier (i.e., don't go by the estimated time to complete for this particular project, the calculations simply can't compute it).
----------------------------------------Secondly, HPF2 has had a very long standing issue of WU's completing (aborting) virtually straight away. The Techs know about this, and have attempted on numerous occasions to try and fix it - although it's one of those issues which is extremely hard to pin down (some times it happens, some times not...). Very frustrating, I know, but thankfully, no time is generally wasted on these WU's. If you get numerous WU's for HPF2 which go the similar way, I'd suggest de-selecting that project from your selection. ![]() |
||
|
|
CandymanWCG
Senior Cruncher Romania Joined: Dec 20, 2010 Post Count: 421 Status: Offline Project Badges:
|
Thanks gb009761! Not the answer I was hoping for, but an answer non the less. I hope this doesn't happen very often, though. At least my mind's at ease since it's not something "wrong" with my machine or anything.
---------------------------------------- Off-topic: just noticed your team. Are you an IBM UKI regular employee? Cheers! Knowledge is limited. Imagination encircles the world! - Albert Einstein ![]() [Edit 1 times, last edit by CandymanWCG at Jan 14, 2011 9:18:09 PM] |
||
|
|
gb009761
Master Cruncher Scotland Joined: Apr 6, 2005 Post Count: 3010 Status: Offline Project Badges:
|
With regards to the HPF2 error, here are a few threads (amongst many) that have discussed it in the past...
----------------------------------------many errors onnly in THIS project anyone else seeing these kinds of errors? I'm getting tons of them. Updated HPF2 Science Applications Happy reading... As to being a regular IBM employee - I was, until last year when I got laid off ![]() ![]() |
||
|
|
CandymanWCG
Senior Cruncher Romania Joined: Dec 20, 2010 Post Count: 421 Status: Offline Project Badges:
|
Thanks for the threads. I really didn't take the time to see if there was anything else out there about this topic. I guess I know better now...
---------------------------------------- Sorry to hear about your job, but I hope you've got something even better now. Btw, I'm a contractor, so I thought maybe we'd hook up on Sametime or something, but too bad. I guess we'll just use the chat thread if we have something to share. Happy crunching! Cheers, mate! ![]() Knowledge is limited. Imagination encircles the world! - Albert Einstein ![]() |
||
|
|
sk..
Master Cruncher http://s17.rimg.info/ccb5d62bd3e856cc0d1df9b0ee2f7f6a.gif Joined: Mar 22, 2007 Post Count: 2324 Status: Offline Project Badges:
|
All sorts of system issues can cause what I describe as runaway errors; continual task errors after a short time (<15sec). While these are rare, they regularly pop up in the forums - they stand out a mile.
It's good that you are running multiple projects. The best thing to do is to shut the system down completely, and then start it up again, don't even do a restart. Then you can start to narrow the problem down, if it reappears. ![]() |
||
|
|
CandymanWCG
Senior Cruncher Romania Joined: Dec 20, 2010 Post Count: 421 Status: Offline Project Badges:
|
Hi skgiven, thanks for the input.
----------------------------------------This was the first time I got the error (unfortunately I have a good feeling it's not going to be the last). Since then, I haven't rebooted, but I have 2 other HPF2 queued and ready to be crunched (and a couple more tasks too), so it's not by far a "continual" error. Not yet, anyway. But I will keep that in mind if it should happen. Maybe it was the coincidence of having the CEP2 files uploading at the same time this task came in and started crunching...who knows? I'm just happy right now that I can keep on running tasks from all my selected projects. Cheers! Knowledge is limited. Imagination encircles the world! - Albert Einstein ![]() |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
With this elusive HPF2 bug which used to be a /401 and after the last attempt to fix shifted to /711 line error (and occurring an order of magnitude less):
----------------------------------------- Crashing only on Windows. Never on Mac or Linux (touch wood) - Often failing only when in multicore combo, when HFCC or FAAH are already running (personal observation on Windows quad) - never tested with C4CW and CEP2 combo my side. - 99 out 100 failing in first seconds... lossless to computing time, just eating download bandwidth. Run them in a mix on specific devices that show this fail, or deselect the science completely for that device by linking a specific device profile to these device(s) with fail. With running a science mix, the number of valid results from other sciences will ensure the continued supply of new tasks. Running HPF2 only and always failing eventually results in no work being send except 1 per day to test if the problem has disappeared. cheers edit: spell [Edit 1 times, last edit by Former Member at Jan 15, 2011 8:35:33 AM] |
||
|
|
sk..
Master Cruncher http://s17.rimg.info/ccb5d62bd3e856cc0d1df9b0ee2f7f6a.gif Joined: Mar 22, 2007 Post Count: 2324 Status: Offline Project Badges:
|
Just an observation - I found HPF2 tasks failed en mass on 2003 servers. On several occasions I had 20 or 30 back to back failures, no new tasks for 24h, and even regular failures in mixed project setup. I couldn't trust them enough to leave for a couple of days, so in the end I just excluded that project from those systems.
|
||
|
|
CandymanWCG
Senior Cruncher Romania Joined: Dec 20, 2010 Post Count: 421 Status: Offline Project Badges:
|
@SekeRob I've just returned my first result for HPF2 (got validated right away too) and I'm processing another one, plus one in the queue. I would say I'm safe for now and I'd like to consider that error an accident. I don't plan on spending any time testing. We'll just wait and see. Che sera, sera!
----------------------------------------@skgiven Luckily, I'm not in that situation to have a chain of tasks fail, one after another. As I've told SekeRob too, I have returned a valid result for HPF2 and I have others downloaded and crunching. As for the 2003 server reference, I don't know if this is what you meant, but I'm running Win 7 Ultimate x64 on my machine. Cheers to all! Knowledge is limited. Imagination encircles the world! - Albert Einstein ![]() |
||
|
|
|