| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 5
|
|
| Author |
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Over 67 hours running a WU (just my machine alone), and then lists as an error on all computers that run it. No science, no credits, just wasted computer time. Is there a way to filter bad WUs out before posting them? Should we just switch to other projects?
WU: 10001358-10001450 <message> Maximum CPU time exceeded </message> <stderr_txt> About to call graphics init Failed to get VersionInfo size: 1812 ![]() |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
WCG send out 'Cancel' messages to hosts to abort either manually (5.4.x) or automatic (5.8.x). That only works if the clients initiates the contact thru for instance scheduling or thru hitting the update button of BOINC in the Projects tab!
----------------------------------------Further coding has been added for future release to improve the process. It's unfortunate, but if members do not read the fora, or the message logs or are simply off-line for longer periods, it becomes very hard to do anything to stop these rogue processes. sorry Added: Obviously, the recommendation is to upgrade post-haste to the latest version 5.8.15 (with WCG skin for Windows):
WCG
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 1 times, last edit by Sekerob at Mar 28, 2007 2:01:36 PM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
WCG send out 'Cancel' messages to hosts to abort either manually (5.4.x) or automatic (5.8.x). That only works if the clients initiates the contact thru for instance scheduling or thru hitting the update button of BOINC in the Projects tab! Further coding has been added for future release to improve the process. It's unfortunate, but if members do not read the fora, or the message logs or are simply off-line for longer periods, it becomes very hard to do anything to stop these rogue processes. And if the machine is off site? Do all of the "projects" have this problem or is it just Genome Comparison? In other words, which projects don't have this problem, so I can just run them? Thanks in advance! |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Bad work units on any WCG project are very rare. Unfortunately, there is no way to detect them before hand.
However, when it does happen, WCG can act fast. They look at the failed work units, and fix the problem. As far as I know, the recent issue with Genome Comparison has already been fixed (several days ago, now). The bad work units should time out automatically (as yours did), so the worst possible case is you lose a little crunching time. And for that, I apologise. |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
..... And if the machine is off site?..... Thanks in advance! The logical conclusion would be that off-site machines have a contact schedule which would than fetch the abort message for the bad work units, 'started/in progress' or in 'ready to run' state upon remote instruction from the project server AGAIN, only when the host contacts the server. Case here is to upgrade those off-site machines to the suggested version ASAP, to automate the action. If such machines have a scheduled on-line, would propose to set the 'connect to' frequency to either the default of WCG being 0.3 days (every 7.2 hours) or if concerned about crunching while for instance the network isa down for a longer period to 1 day. Obviously that would in the worst case cause bad WU's to run amok for that scheduled time.
WCG
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 2 times, last edit by Sekerob at Mar 29, 2007 6:11:06 PM] |
||
|
|
|