| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 10
|
|
| Author |
|
|
justakiwibird
Cruncher New Zealand Joined: Mar 8, 2006 Post Count: 41 Status: Offline Project Badges:
|
I'm crunching F@H and the new cancer project (at least that's what I've set in my project preferences)
----------------------------------------For some time now almost all of the WU I crunch are returning the "exited with zero status" error at least once and often three or four times. Most of them do eventually complete but it's wasting a heck of a lot of processing power crunching the same WU over and over like this. I'm running linux so I'm using the linux version of BOINC - latest version. It's getting to the point where I'm thinking of ditching this project. I believe it's a hugely worthwhile one but right now it's just not working efficiently enough to be worthwhile. My crunching power would be better put to use on something that works! I've done all the obvious things such as reset the project, reboot etc etc but it's made no difference. Is this a known issue or does anyone have any suggestions as to why I'm getting all these errors/and what, if anything I can do about it? TIA. ![]() |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
This is almost always caused by a hardware issue. Are you overclocking?
|
||
|
|
justakiwibird
Cruncher New Zealand Joined: Mar 8, 2006 Post Count: 41 Status: Offline Project Badges:
|
No overclocking, and up until recently I was crunching Seti and Rosetta on this machine, with no problems at all. I seriously doubt it's a hardware issue as it's specific to WCG projects. It does seem to have gotten worse since I changed my project preferences to include the new cancer project, but whether that's just a fluke or not I have no idea.
----------------------------------------I don't care about stats at all, but I'd like to be contributing what little processing power I have - and right now it feels like I'm wasting most of it, crunching around in circles. ![]() |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Okay then. While you shouldn't rule out hardware problems just because only WCG has this issue, we'll take a look at the alternatives.
The other possibilities include a problem caused by sleeping or hibernating your computer, and conflicts with other software. One thing you could try is toggling the "Leave applications in memory while preempted?" setting in your device profile. |
||
|
|
justakiwibird
Cruncher New Zealand Joined: Mar 8, 2006 Post Count: 41 Status: Offline Project Badges:
|
Okay then. While you shouldn't rule out hardware problems just because only WCG has this issue, we'll take a look at the alternatives. The other possibilities include a problem caused by sleeping or hibernating your computer, and conflicts with other software. I don't use any hibernation features. My machine is on 24/7 so I do have the monitor set to switch off after 20 minutes, but I don't think that should affect anything. One thing you could try is toggling the "Leave applications in memory while preempted?" setting in your device profile. Somebody suggested that a while back so I've done that but again, it hasn't made any difference. What kinds of hardware problems are you thinking of? ![]() |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I've done a little research, and I think I have a better handle on the problem now.
There used to be rare cases when this error would occur under normal conditions, with the work unit actually completed. I understand that's fixed now (besides, it was Rosetta). The most mentioned cause is a problem with the system clock. If your realtime clock isn't keeping time and is being frequently jumped a couple of minutes when it is corrected using NTP, then this would cause your error. Failing that, then there is something preventing the BOINC client from communicating properly with the science application. As I said, this is known to occur during hibernation on some (but not all, or even many) computers). However, you've ruled that out. This leaves an OS issue, a problem with your antivirus or security software, or virus/spyware activity, or (as mentioned before) some random and hard to trace hardware issue. I strongly suspect that reinstalling your operating system will magically fix the problem, but I fully understand that you may not want to do that. Check your clock. Look in the event log (wait, you said Linux). Well, look wherever it is that ntpd keeps its logs, and see if it has been struggling to keep the clock synched. |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Yep, switched that auto clock sync off...... in fact you can replicate the error by hitting the sync button when its a few minutes off or just manually change the time......back is the best one.
----------------------------------------
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
To avoid more serious errors caused by an incorrect clock, it is probably better to configure (S)NTP properly, and sync as frequently as required. The correction should only be tiny, and shouldn't cause a jump at all - it works by making the length of a "second" a tiny bit longer or shorter. Complicated stuff, particularly if you're playing with NTP not SNTP.
Windows, for example, will only jump the clock if it needs to make a correction of more than 15 seconds. Well, if it's set up properly. YMMV. |
||
|
|
justakiwibird
Cruncher New Zealand Joined: Mar 8, 2006 Post Count: 41 Status: Offline Project Badges:
|
The most mentioned cause is a problem with the system clock. If your realtime clock isn't keeping time and is being frequently jumped a couple of minutes when it is corrected using NTP, then this would cause your error. Ah! That might be it. I've been having a bit of a problem with my system clock lately - I'm on dial up and only online for sporadic periods of time. Combine that with the fact that I haven't been able to get any time server synching to work. I posted elsewhere about it a while back but nothing anyone suggested worked - running the ntp daemon doesn't work well when you're on dial up/online for such sporadic periods of time, and even when I tried to update manually, all I got was errors about server not being reachable. I haven't managed to connect to ANY time servers for some weird reason. Not my firewall or anything like that, so I have no idea what it is. After the problems I had the other day trying to get my time back in synch I ended up having to reset the clock via CMOS setup. Right now (now that I've checked) it's about two minutes slow. I'm starting to wonder if maybe it's time for a new CMOS battery. Anyway, thanks for your help everyone. I think maybe you've hit the nail on the head. BTW, if anyone knows how the heck I can get my time synching working (SUSE10) I'd be grateful for any suggestions. At the moment, even if I replace the battery I still have no way of keeping my time in check. I strongly suspect that reinstalling your operating system will magically fix the problem, but I fully understand that you may not want to do that. That's why I run linux - none of those "Just reinstall Windows" solutions to EVERY little glitch ---------------------------------------- ![]() |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
If you have other computers networked, you can set one of them up as a time server. Pick one with a reliable clock :-)
|
||
|
|
|