| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 2
|
|
| Author |
|
|
sk..
Master Cruncher http://s17.rimg.info/ccb5d62bd3e856cc0d1df9b0ee2f7f6a.gif Joined: Mar 22, 2007 Post Count: 2324 Status: Offline Project Badges:
|
Suggestion of a failure control mechanism based on the introduction of Result Status tallies on a per project basis.
I quite often see posts about what I describe as run-away-failures; where tasks from one project continuously fail on systems. These are often the result of non-WCG processes, but none the less effect some projects more than others. My suggestion is that the servers keep an Error Tally list by system and when the error number is too great, the server sends an email to the cruncher to advise of the problem and only sends other task types. The email could notify the cruncher that their project selection has been changed (if they only selected one project) to run different tasks because that project’s fail rate on that system was too high, and that they should restart their system (often resolves this situation). If the cruncher has more than one project, then they would just stop crunching one of their chosen projects until the cruncher selects it again (following a system restart, hopefully). |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
The book of knreed dreams has a feature listed to automatically stop sending a specific science to a client if it always fails, accompanied by a red message being logged in the client. With client 6.12 that could be given form through the popup feature of the new Notices window of BOINC. WCG would then send once in a while a trial task of that constantly failing science app to check if the problem was resolved.
------------------------------------------//--
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
|
|