Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 4
|
![]() |
Author |
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2155 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Sometimes it's sheer luck that there is a maximum of only four wingmen, thinking of the not unlimited time to finish and upload a result and the fact that people are aborting tasks.
This is the story of ARP1_0018335_011… (No, not Ma Baker) Two copies of ARP1_0018335_011 were distributed on 13-05-2020 at 19:56 (times in UTC). Only 15 minutes later, wingman _0 abandoned control, so copy _1 was left to my own device. The WCG server immediately took action and sent out copy _2 (on 13-05-2020 at 20:11). Unfortunately, the operator of the computer behind copy _2 decided to cancel that task the next day, that's why it got its status User Aborted on 14-05-2020 at 17:18. In turn, another wingman was assigned a copy a few minutes later: _3, with seven days to return the result. Days went by, in the meantime my device reported its result for copy _1 (on 16-05-2020 at 09:34). Then, nothing happened, so the deadline for copy _3 arrived (21-05-2020 at 17:20) and still no answer from that machine with copy _3. One option left for the WCG server: to send out the last possible copy (_4) for workunit ARP1_0018335_011. Things were getting dire as it seemed that there was only one chance left to get a Valid result. Or was there still a small chance that the result for copy _3 would be reported back to the WCG server? (The deadline this time was 25-05-2020 at 05:20.) Again, days went by where nothing seemed to happen and so it was that big day, 25-05-2020 finally, the last chance to meet the deadline. Of course, we don't want to be kept in suspense any longer, we want to know what happened. ![]() Suddenly, half an hour past midnight, the machine with copy _3 returned its result, although more than three days late (after 10 days and 7 hours) and with a 4 day runtime (96 hours) for that task: but … just in time! ![]() And what's more, only five minutes later, the result for copy _4 was also handed in. ![]() I was almost in cold sweat, did they try to scare me? ![]() Long story short: Result Name OS AVN Status Sent Time Due / Return Time CPUh Claimed/Granted[Generated by wcgformat] Moral of the story: Don't abort your ARP1 task ![]() ![]() (Even if the deadline passes by, you'll still have some time left after the next wingman returns their result. ![]() |
||
|
yoerik
Senior Cruncher Canada Joined: Mar 24, 2020 Post Count: 413 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I'm in the same boat with a MCM WU - I'll post that story in another thread. Ugh. I returned the result 10 days ago. It's valid.
----------------------------------------smh ![]() |
||
|
rbotterb
Senior Cruncher United States Joined: Jul 21, 2005 Post Count: 401 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I wonder if this is happening due to the limited amount of checkpoints each of these ARP1 tasks have. I know for my laptop, its taking about 4 hours or so between checkpoints to pick up another 12.5% completed. Generally when I get to the afternoons, quite often I have to put these WUs in Suspend mode since I know they won't make it to the next checkpoint before I shut down my laptop for the day. I manage to get each of these WUs done in 4-5 days calendar time, but I can see where many other small crunchers are running them only to lose all their crunch time many days due to not making it to the next checkpoint for closing down for the day. I kind of wish the programmers of this project would figure out a way to add more checkpoints - maybe after each 5% completed. I suspect more of these WUs would get completed successfully if this change was done.
|
||
|
phillipspencer
Advanced Cruncher France Joined: Apr 9, 2015 Post Count: 71 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I wonder if this is happening due to the limited amount of checkpoints each of these ARP1 tasks have. I know for my laptop, its taking about 4 hours or so between checkpoints to pick up another 12.5% completed. Generally when I get to the afternoons, quite often I have to put these WUs in Suspend mode since I know they won't make it to the next checkpoint before I shut down my laptop for the day. [SNIP] I kind of wish the programmers of this project would figure out a way to add more checkpoints - maybe after each 5% completed. I suspect more of these WUs would get completed successfully if this change was done. I agree completely. When ARP started I didn't realise how much I would lose when shutting down. While my desktop took 20-24 hours to process a WU my laptop is slower than yours and took almost 48 hours. If any are marked invalid (as has happened a couple of times for me) then the monitoring / shepherding through needed is just not worth it so I have stopped processing ARP WUs until smaller units or more frequent checkpoints are introduced. A shame as I wanted to support this important project. |
||
|
|
![]() |