Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 102
Posts: 102   Pages: 11   [ Previous Page | 2 3 4 5 6 7 8 9 10 11 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 14063 times and has 101 replies Next Thread
Mr P Hucker
Advanced Cruncher
Joined: Aug 12, 2006
Post Count: 74
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Not enough time given for a task

I see some of the ARP tasks have a half week deadline instead of 1 week - why is this? These are often returned a little late by me, due to Boinc being absolutely useless with scheduling. It enters panic mode far too late. I have a 0+3 hour buffer, so there isn't usually a queue, but I run a few projects with multicore tasks, and Boinc just gets all confused, the poor little thing. It sees tasks that can fill all the cores and does them, leaving the ARP tasks sat there doing nothing until the last minute, because there aren't enough of them to fill all the cores. Then it crams them in at the same time as the multicore tasks and wonders why they take longer than expected!

So my question is, if I return one a little late, is it still useful? Or will it have been resent?

I've heard of the following happening: In Primegrid if you're sending trickleups, they give you extra time. In some projects the task is immediately sent to someone else when you miss the deadline, even though you're at 90%, then when you send it in, you get the credits, the project gets the results it needs, but the other guy is wasting his time, and if he's started it, the project doesn't cancel the task!
[Nov 10, 2020 12:05:04 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7697
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Not enough time given for a task

Generally speaking, even if you return a unit a little late, you get credit for it. If another person should finish their unit,if it gets reissued, before you have started your unit, it will be server aborted. If you finish your unit before a re-issued unit has been started by another cruncher, theirs will be server aborted. With these long running units, if you are only a little bit late, maybe a couple of hours, it is likely the other person's unit will be server aborted, if indeed another unit has been issued.
Hope this helps.
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Nov 10, 2020 3:53:48 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mr P Hucker
Advanced Cruncher
Joined: Aug 12, 2006
Post Count: 74
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Not enough time given for a task

That's what I thought would happen. What should happen is it should get server aborted even if they've started it. They're wasting their time doing what's already been completed by someone else, and could be working on another task. If you feel this might upset them by cancelling it halfway, then by all means give them credit for their attempted work, but there's no point in them continuing to run an already completed task. It is possible to do this, the Boinc server does have a command to cancel work in progress.
----------------------------------------
[Edit 1 times, last edit by Peter Hucker at Nov 11, 2020 2:19:27 PM]
[Nov 11, 2020 2:18:23 PM]   Link   Report threatening or abusive post: please login first  Go to top 
spRocket
Senior Cruncher
Joined: Mar 25, 2020
Post Count: 277
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Not enough time given for a task

That's what I thought would happen. What should happen is it should get server aborted even if they've started it.

Or, use it as an extra validation/verification.

I know that on ARP, I get a fair number of _2 and _3 units that have one or more of "User Aborted" or "No Reply" from the previous crunchers. Sometimes when I check after completion, I've seen the "No Reply" become "Too Late" instead.
[Nov 12, 2020 3:21:02 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12439
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Not enough time given for a task

The "Too Late" occurs when the re-send validates the original before "No Reply" replies whether started or not.
[Nov 12, 2020 3:51:21 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mr P Hucker
Advanced Cruncher
Joined: Aug 12, 2006
Post Count: 74
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Not enough time given for a task

The "Too Late" occurs when the re-send validates the original before "No Reply" replies whether started or not.


It makes sense to send it to someone else If I'm running late, the server has no guarantee I'll finish it any time soon. But if I do then send it in late, why leave the other guy calculating stuff for nothing?

I'm not talking about who gets credits here, that really is unimportant. What matters is CPU time is being wasted. The other guy could be processing another work unit instead of redoing what I've already completed.

I know the Boinc server can abort a running task, that option should be used.
[Nov 12, 2020 6:49:56 PM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 987
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Not enough time given for a task

The "Too Late" occurs when the re-send validates the original before "No Reply" replies whether started or not.


It makes sense to send it to someone else If I'm running late, the server has no guarantee I'll finish it any time soon. But if I do then send it in late, why leave the other guy calculating stuff for nothing?

The problem here is that there may be times when it is probably more equitable to let a task finish anyway - if the task is (say) 90% complete should it still be cancelled???

I don't think the server normally knows the fine details of task progress - it is probably "Not started" -> "Started" -> "Finished", in which case how can it assess a running job? If a job is aborted whilst running, is there any guarantee that there's an accurate estimate of work done before cancellation? (In re badges and credit, for those it concerns...) Projects like CPDN can do partial credit for tasks that fail to complete because they base credit on trickles rather than on total work - I don't know whether such an approach would be practical for WCG...

(If I'm wrong about aborted task status, would someone who is familiar with the live code please explain what the actual position is!)

Bear in mind that quite a few contributors focus on one or two projects at a time, and if their jobs were being prematurely terminated I suspect they'd be extremely unhappy! There are also lots of folks here who are enthusiastic badge-collectors, and if they kept having partially completed tasks cancelled without contributing to their time used, more "unhappiness" would be seen. Too much of that could lead to contributors giving up!

I'm not talking about who gets credits here, that really is unimportant. What matters is CPU time is being wasted. The other guy could be processing another work unit instead of redoing what I've already completed.

I think you've left "in my opinion" off the above! smile Whilst I'm inclined to agree about wasted time, see my previous points regarding potential user reaction! Perhaps the answer might be to have a configuration option along the lines of "Should we abort running tasks whose results are no longer needed?" with a note regarding whether credit will be awarded or not.

As for waste, for some ARP1 project users the biggest waste would already have happened if the job is abandoned (whether started or not) because of the enormous download it requires...

I know the Boinc server can abort a running task, that option should be used.

The "Abort tasks whether started or not" facility is primarily intended for situations where there is something wrong with the tasks or it is known there is something wrong with a particular user's client.

I would love to know what sort of discussions take place at various BOINC project places regarding what to do about late-running tasks! I suspect there's no "one size fits all" solution, and "let it finish" is the easiest implementation (albeit sub-optimal in several ways!)

Cheers - Al.
[Nov 12, 2020 9:31:28 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12439
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Not enough time given for a task

There is also the problem that the scientists need whole batches to be returned to them as a single entity so straggling units need to be done quickly. Because of the iterative nature of the batches, we sometimes see original units issued with short deadlines - not just re-sends.

Mike
[Nov 13, 2020 2:45:07 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Mr P Hucker
Advanced Cruncher
Joined: Aug 12, 2006
Post Count: 74
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Not enough time given for a task

The problem here is that there may be times when it is probably more equitable to let a task finish anyway - if the task is (say) 90% complete should it still be cancelled???
Of course, otherwise the processing that user is doing, using his electricity, is going nowhere.

I don't think the server normally knows the fine details of task progress - it is probably "Not started" -> "Started" -> "Finished", in which case how can it assess a running job? If a job is aborted whilst running, is there any guarantee that there's an accurate estimate of work done before cancellation? (In re badges and credit, for those it concerns...) Projects like CPDN can do partial credit for tasks that fail to complete because they base credit on trickles rather than on total work - I don't know whether such an approach would be practical for WCG...
I shake my head in shame at those who do it for credits.

Perhaps the answer might be to have a configuration option along the lines of "Should we abort running tasks whose results are no longer needed?" with a note regarding whether credit will be awarded or not.
Good idea, as that would both solve the problem of wasted CPU cycles, and nobody in their right mind would choose "no, let me continue working on something that will be thrown in the bin".

As for waste, for some ARP1 project users the biggest waste would already have happened if the job is abandoned (whether started or not) because of the enormous download it requires...
So no further harm done cancelling it.

The "Abort tasks whether started or not" facility is primarily intended for situations where there is something wrong with the tasks or it is known there is something wrong with a particular user's client.
How is this any different? The tasks in question are of no use to the scientists, continuing to run them is moronic.

I would love to know what sort of discussions take place at various BOINC project places regarding what to do about late-running tasks! I suspect there's no "one size fits all" solution, and "let it finish" is the easiest implementation (albeit sub-optimal in several ways!)
More likely nobody considers it much at all and things get left on defaults until it causes them a noticeable problem.
[Nov 13, 2020 8:56:18 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mr P Hucker
Advanced Cruncher
Joined: Aug 12, 2006
Post Count: 74
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Not enough time given for a task

There is also the problem that the scientists need whole batches to be returned to them as a single entity so straggling units need to be done quickly. Because of the iterative nature of the batches, we sometimes see original units issued with short deadlines - not just re-sends.
It's actually Primegrid and the Boinc scheduler that caused the problem here. I just joined Primegrid and my buffer is 3 hours. But they gave me (an estimated) 2 weeks! Since they're multicore set to use all the cores of the system, any project with single core tasks like ARP gets paused until the multicore ones are done, since Boinc is too thick to get more singles to fill up the CPU.
[Nov 13, 2020 8:58:42 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 102   Pages: 11   [ Previous Page | 2 3 4 5 6 7 8 9 10 11 | Next Page ]
[ Jump to Last Post ]
Post new Thread