| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 4
|
|
| Author |
|
|
bluestang
Senior Cruncher USA Joined: Oct 1, 2010 Post Count: 274 Status: Offline Project Badges:
|
Would be nice if there was a way for the Server to Abort Stuck WUs when BOINC communicates back to project.
----------------------------------------Every now and then, I get an OPNG WU that will run for hours before I catch it and manually Abort it. These are WUs that usually take less than 10 min if ran concurrently with other WUs. If there was a way to have them caught and aborted after running for a period of time with no increase in WU progress or something that would be great! |
||
|
|
Bryn Mawr
Senior Cruncher Joined: Dec 26, 2018 Post Count: 384 Status: Offline Project Badges:
|
Would be nice if there was a way for the Server to Abort Stuck WUs when BOINC communicates back to project. Every now and then, I get an OPNG WU that will run for hours before I catch it and manually Abort it. These are WUs that usually take less than 10 min if ran concurrently with other WUs. If there was a way to have them caught and aborted after running for a period of time with no increase in WU progress or something that would be great! I think that there is a time limit built into the WU but it relies on the WU checkpointing - if it is stuck then it will never get to check whether it is over the limit. |
||
|
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2346 Status: Offline Project Badges:
|
Would be nice if there was a way for the Server to Abort Stuck WUs when BOINC communicates back to project. The WCG server doesn't have any clue about tasks getting or being stuck, because the BOINC client (also) doesn't know. The BOINC client is the link between the tasks on your computer and the WCG server. The BOINC client isn't aware of stuck tasks, because it doesn't know about any definition of stuck. However, it does know about a time limit. When that limit is exceeded, your 'stuck' task will be aborted. In the past, I've had GPU tasks that had a time limit of 1.68 hours (100 minutes) with an expected duration of only 3 minutes, while they needed more than 100 minutes to run. Sure enough they were aborted by the BOINC client after 100 minutes of runtime. That's the BOINC client saying "Your task is taking too long, I'm aborting it." It doesn't say "Your task is stuck, I'm aborting it." However, it's the BOINC client's way of saying "Your task is stuck, so I'm aborting it" (if you want to look at it that way ).Every now and then, I get an OPNG WU that will run for hours before I catch it and manually Abort it. These are WUs that usually take less than 10 min if ran concurrently with other WUs. If there was a way to have them caught and aborted after running for a period of time with no increase in WU progress or something that would be great! Then you need a computer program running in the background to check every now and then if there is a task that exceeds your limits, meeting your definition of being stuck. ![]() EDIT: added a simple program: Say you would run this script, letting the process sleep every 5 minutes till the next check, then letting it detect if there is an OPNG-task running for at least 10 minutes, and if there is, abort it. With echoed comments. For your pleasure. Use at your own risk. while sleep 300; do [Edit 2 times, last edit by adriverhoef at Oct 27, 2021 6:18:42 PM] |
||
|
|
bluestang
Senior Cruncher USA Joined: Oct 1, 2010 Post Count: 274 Status: Offline Project Badges:
|
Yeah, wasn't sure exactly what could be done on the Server side. I guess I was thinking of the "time limit" on a WU as you guys mentioned...thanks for clarifying what I was trying to suggest.
----------------------------------------@adriverhoef Thank you very much for that script! I will give it a go tomorrow when I get back to my machines. Also, is that a script for Linux or Windows? ---------------------------------------- [Edit 1 times, last edit by bluestang at Oct 28, 2021 2:08:42 PM] |
||
|
|
|