Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 4
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 3702 times and has 3 replies Next Thread
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2155
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Four wingmen

Sometimes it's sheer luck that there is a maximum of only four wingmen, thinking of the not unlimited time to finish and upload a result and the fact that people are aborting tasks.

This is the story of ARP1_0018335_011(No, not Ma Baker)


Two copies of ARP1_0018335_011 were distributed on 13-05-2020 at 19:56 (times in UTC).
Only 15 minutes later, wingman _0 abandoned control, so copy _1 was left to my own device.
The WCG server immediately took action and sent out copy _2 (on 13-05-2020 at 20:11).
Unfortunately, the operator of the computer behind copy _2 decided to cancel that task the next day,
that's why it got its status User Aborted on 14-05-2020 at 17:18.
In turn, another wingman was assigned a copy a few minutes later: _3, with seven days to return the result.

Days went by, in the meantime my device reported its result for copy _1 (on 16-05-2020 at 09:34).
Then, nothing happened, so the deadline for copy _3 arrived (21-05-2020 at 17:20) and still no answer from that machine with copy _3.
One option left for the WCG server: to send out the last possible copy (_4) for workunit ARP1_0018335_011.

Things were getting dire as it seemed that there was only one chance left to get a Valid result.
Or was there still a small chance that the result for copy _3 would be reported back to the WCG server?
(The deadline this time was 25-05-2020 at 05:20.)

Again, days went by where nothing seemed to happen and so it was that big day, 25-05-2020 finally, the last chance to meet the deadline.
Of course, we don't want to be kept in suspense any longer, we want to know what happened. biggrin Well, I'll tell ya.

Suddenly, half an hour past midnight, the machine with copy _3 returned its result, although more than three days late
(after 10 days and 7 hours) and with a 4 day runtime (96 hours) for that task: but … just in time! applause (with a few hours left on the clock)
And what's more, only five minutes later, the result for copy _4 was also handed in. hugs
I was almost in cold sweat, did they try to scare me? devilish

Long story short:
Result Name          OS           AVN Status       Sent Time         Due / Return Time CPUh  Claimed/Granted
ARP1_0018335_011_4-- Linux Ubuntu 727 Valid 5/21/20 17:21:55 5/25/20 00:35:06 10.69 511.2/1,223.5
ARP1_0018335_011_3-- Linux 727 Valid 5/14/20 17:20:30 5/25/20 00:30:45 96.88 1,389.8/1,223.5
ARP1_0018335_011_2-- LinuxMint 727 User Aborted 5/13/20 20:11:10 5/14/20 17:18:57 0.00 530.5/0.0
ARP1_0018335_011_1-- Linux Fedora 727 Valid 5/13/20 19:56:17 5/16/20 09:34:26 21.76 1,057.2/1,223.5
ARP1_0018335_011_0-- Linux CentOS - Detached 5/13/20 19:56:07 5/13/20 20:11:06 0.00 0.0/0.0
[Generated by wcgformat]

Moral of the story:
Don't abort your ARP1 task idea , there's plenty of time left for it (as long as you don't let it get out of hand). good luck
(Even if the deadline passes by, you'll still have some time left after the next wingman returns their result. wink )
[May 25, 2020 9:10:07 PM]   Link   Report threatening or abusive post: please login first  Go to top 
yoerik
Senior Cruncher
Canada
Joined: Mar 24, 2020
Post Count: 413
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Four wingmen

I'm in the same boat with a MCM WU - I'll post that story in another thread. Ugh. I returned the result 10 days ago. It's valid.

smh
----------------------------------------

[May 25, 2020 9:35:18 PM]   Link   Report threatening or abusive post: please login first  Go to top 
rbotterb
Senior Cruncher
United States
Joined: Jul 21, 2005
Post Count: 401
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Four wingmen

I wonder if this is happening due to the limited amount of checkpoints each of these ARP1 tasks have. I know for my laptop, its taking about 4 hours or so between checkpoints to pick up another 12.5% completed. Generally when I get to the afternoons, quite often I have to put these WUs in Suspend mode since I know they won't make it to the next checkpoint before I shut down my laptop for the day. I manage to get each of these WUs done in 4-5 days calendar time, but I can see where many other small crunchers are running them only to lose all their crunch time many days due to not making it to the next checkpoint for closing down for the day. I kind of wish the programmers of this project would figure out a way to add more checkpoints - maybe after each 5% completed. I suspect more of these WUs would get completed successfully if this change was done.
[Jun 1, 2020 4:57:22 PM]   Link   Report threatening or abusive post: please login first  Go to top 
phillipspencer
Advanced Cruncher
France
Joined: Apr 9, 2015
Post Count: 71
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Four wingmen

I wonder if this is happening due to the limited amount of checkpoints each of these ARP1 tasks have. I know for my laptop, its taking about 4 hours or so between checkpoints to pick up another 12.5% completed. Generally when I get to the afternoons, quite often I have to put these WUs in Suspend mode since I know they won't make it to the next checkpoint before I shut down my laptop for the day. [SNIP] I kind of wish the programmers of this project would figure out a way to add more checkpoints - maybe after each 5% completed. I suspect more of these WUs would get completed successfully if this change was done.

I agree completely. When ARP started I didn't realise how much I would lose when shutting down. While my desktop took 20-24 hours to process a WU my laptop is slower than yours and took almost 48 hours. If any are marked invalid (as has happened a couple of times for me) then the monitoring / shepherding through needed is just not worth it so I have stopped processing ARP WUs until smaller units or more frequent checkpoints are introduced. A shame as I wanted to support this important project.
[Jun 4, 2020 12:43:07 PM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread