World Community Grid - View Thread - Change BOINCs reaction on results returned with delay in order to save participants CPU time?

World Community Grid Forums

Category: Support

Forum: Suggestions / Feedback

Thread: Change BOINCs reaction on results returned with delay in order to save participants CPU time?

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 12

[ ]

Author

This topic has been viewed 2234 times and has 11 replies

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Change BOINCs reaction on results returned with delay in order to save participants CPU time?

Hello,
please have a look at the following Protocol of a work unit processed in the clean energy Project:

Workunit Status
Project Name:
The Clean Energy Project
Created:
11.05.09
Name:
E000580_084B_001t2n00x
Minimum Quorum:
2
Replication:
2

Result Name
App Version Number
Status
Sent Time
Time Due /
Return Time
CPU Time (hours)
Claimed/ Granted BOINC Credit
E000580_ 084B_ 001t2n00x_ 2--
631
Valid
23.05.09 00:09:28
24.05.09 01:53:30
19.57
390.1 / 306.4
E000580_ 084B_ 001t2n00x_ 0--
631
Valid
13.05.09 00:19:25
19.05.09 05:55:29
22.17
310.0 / 306.4
E000580_ 084B_ 001t2n00x_ 1--
631
Valid
13.05.09 00:10:44
23.05.09 02:18:23
39.95
302.7 / 306.4

The point is> The minimum quorum is 2 replications.
In reallity, however, the work unit has (in my opinion unnecessarily) been processed three times.
The reason for processing it three instead of two times is, that my computer has returned the result 1 hour too late.
I understand, that time limits are necessary to gurarantee porper project progress.

Nevertheless, the thrid processing of the same work unit could be avoided, if BOIC was modificated.

The as is status is the folloing: Finding out, that a necessary result has not been returend in time, the WCC server sends out the same work unit again. The replication is then three instead of two. The third computer processes the whole work unit, even if the second computer, which is in delay with returning the result, gives back the result before the third computer has finished processing the work unit.

In my opinion, the one of the next versions of Boinc should contain changes which lead to the following - to be satus -:
If minimum replication is two and the second computer does not return the result (valid) in time, it is sent out a third time, just as now. However, if the second client finishes pocessing the work unit before the third one and returns it (valid) to WCC server, the WCC sever gives a signal to the third client to stop processing the workunit .
In this way, we have a valid result and the third computer, whose result would be redundand, is free to process another work unit more quickly.

I ask for an answer and would be very happy if my propoostion could be considered and programmed in one of the next versions of BOINC.
Greetings
Yours
Martin

[May 24, 2009 9:57:07 AM]

Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline


Re: Change BOINCs reaction on results returned with delay in order to save participants CPU time?

There's an except: You with late reporting have caused the 3rd computer to waste time so you can help by reducing your cache/additional buffer size.

The servers immediately set a signal ready for any client with a redundant result not yet returned, but if the client has started such a task BEFORE contacting the server to receive the signal, the task it is let to finish, so all time is credited, in full.

----------------------------------------

WCG

Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!

[May 24, 2009 10:22:53 AM]

Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline


Re: Change BOINCs reaction on results returned with delay in order to save participants CPU time?

Added:

Client 6.6.x has a feature to contact a server to check if a task is redundant when it's late already, not started yet and abort them upon confirmation. Who knows will a future server & client software version be informed of a started task so it can hold off sending out extra copies.

NB: Not sure of the exact functioning, just a broad outline, the idea though is always to optimize the overall grid efficiency.

----------------------------------------

WCG

Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!

----------------------------------------
[Edit 2 times, last edit by Sekerob at May 24, 2009 10:36:05 AM]

[May 24, 2009 10:25:38 AM]

Ingleside
Veteran Cruncher
Norway
Joined: Nov 19, 2005
Post Count: 974
Status: Offline
Project Badges:

2 year badge for Human Proteome Folding - Phase 2

180 day badge for Discovering Dengue Drugs - Together

1 year badge for Nutritious Rice for the World

1 year badge for The Clean Energy Project

2 year badge for Help Fight Childhood Cancer

180 day badge for Influenza Antiviral Drug Search

2 year badge for Help Cure Muscular Dystrophy - Phase 2

1 year badge for Discovering Dengue Drugs - Together - Phase 2

2 year badge for The Clean Energy Project - Phase 2

2 year badge for Computing for Clean Water

2 year badge for Drug Search for Leishmaniasis

2 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

20 year badge for Mapping Cancer Markers

5 year badge for Uncovering Genome Mysteries

5 year badge for Outsmart Ebola Together

5 year badge for FightAIDS@Home - Phase 2

10 year badge for Microbiome Immunity Project

5 year badge for Africa Rainfall Project

20 year badge for OpenPandemics - COVID-19


Re: Change BOINCs reaction on results returned with delay in order to save participants CPU time?

Hmm, I remember this functionality being discussed, but AFAIK it wasn't added, atleast not at this time. That is included, is:

1; All unstarted work is aborted then passing the deadline.
2; A new function was added to the api, so instead of continuing past the deadline, the application can choose to end early, and send whatever it's already done back to server.

#1 works automatically for all projects, while #2 must be added to each application if project wants to use it.

----------------------------------------

"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."

[May 24, 2009 11:52:15 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Change BOINCs reaction on results returned with delay in order to save participants CPU time?

Hello,
I have reduced my cache.
I had increased it only for an exeptional situation (holidays, internet connection breaking down in absence and unability to get new work units as a consequence.
Nevertheless it would be good if redundand work units would be cancelled in the moment when becoming redundand.
In order to avoid discomfort of user, it should be displayed as completed and the points earned to the moment when the work unit becomes redundand should be granted, in my opinion.
By the way, I am not a total newbee here. Me and kafejka is one and the same person.
All the best to you, thaks to alll community advisors, it is a pleasure to cooperate with you.
Sometimes, however, it is not to easy to understand your answers
Yours
Martin Schnellinger

[May 24, 2009 2:53:51 PM]

Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline


Re: Change BOINCs reaction on results returned with delay in order to save participants CPU time?

No worries and for the cancelling, point 2 of Ingleside's augmentation could create that possibility for WCG... but whether it pleases everyone is a different matter v.v. some who are motivated to squeeze the last second out of 'short supply' work. I though think that WCG in the first instance strives for highest efficiency in the most 'set and forget' possible environment, thus credit for time is due.

happy crunching.

----------------------------------------

WCG

Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!

[May 24, 2009 3:10:19 PM]

Steve WCG
Senior Cruncher
Joined: May 4, 2009
Post Count: 216
Status: Offline


Re: Change BOINCs reaction on results returned with delay in order to save participants CPU time?

First let me state that while I personally would be all for Martin's idea of stopping a crunch as soon as it is no longer relevant, I don't think it is a good idea for grid computing in general. One of the issues is that it turns the idea of crunching into a race with everyone else, which is in direct contradiction to the concept of grid computing, everyone working together towards a commmon goal. Another point is that people with fast machines and small caches would always win (see how quickly the idea of someone returning a WU first equates to winning?). I doubt that the people who, while absolutley dedicated, do not have a monster PC woiuld feel good about *losing*. Do they start dropping off the grid because they don't feel they are contributing? If people start to reduce the size of their cache to play the *who is first* game do we think that the number of idle cycles due to outages (client or server) will be greater than cycles wasted on redundant tasks?

----------------------------------------
[Edit 2 times, last edit by Steve WCG at May 24, 2009 4:39:47 PM]

[May 24, 2009 4:38:04 PM]

Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline


Re: Change BOINCs reaction on results returned with delay in order to save participants CPU time?

It's a good point, those total efficiency driven being OK with mid-trip cancellation, not suiting everyone. For everything there's a solution though... an option in the settings that says yes/no to permit auto-abort for applications that have the build in functionality. wink

----------------------------------------

WCG

Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!

[May 24, 2009 5:02:55 PM]

Ingleside
Veteran Cruncher
Norway
Joined: Nov 19, 2005
Post Count: 974
Status: Offline
Project Badges:


Re: Change BOINCs reaction on results returned with delay in order to save participants CPU time?

The biggest problem with this is, for the user getting the replacement-wu there's normally no reason for the client to connect before task (nearly) finished, so wouldn't get the abortion-message anyway. Please remember, all connections is initiated by the client, and if server-initiated connections had been possible, very many users would have blocked them or just stopped running WCG at all.

For the user reaching the deadline, either uploading whatever already done, or asking for extension on deadline, woudn't give any extra replication so would be improvement.

----------------------------------------

"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."

[May 24, 2009 6:58:24 PM]

knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:

180 day badge for Human Proteome Folding

90 day badge for Human Proteome Folding - Phase 2

45 day badge for Help Cure Muscular Dystrophy - Phase 2

90 day badge for Computing for Clean Water

14 day badge for Uncovering Genome Mysteries

45 day badge for Outsmart Ebola Together

180 day badge for FightAIDS@Home - Phase 2

1 year badge for Microbiome Immunity Project

1 year badge for Africa Rainfall Project

180 day badge for OpenPandemics - COVID-19


Re: Change BOINCs reaction on results returned with delay in order to save participants CPU time?

The task abort code works as follows:

1) If a task is no longer needed (i.e. validation on the workunit has completed and the canonical result has been identified), then an abort if not started message is added to the scheduler request reply the next time the client communicates with the server. This will cause the task to be aborted if it has not yet been started.

2) If the task cannot be used (the workunit is canceled, or if it is so late the workunit has been deleted, etc), then a task abort message is added to the scheduler request reply the next time the client communicates with the server. This will cause the task to be aborted no matter what.

There was the intention to add a preference to BOINC at some point that would allow a user to change how #1 above is handled so that it will either behave like a 'abort if not started' or always 'abort'. I do not believe that this has yet been added.

The reason that there are two behaviors here is that 'credit' is reward given to members for their contribution. I do not want to argue about how important this is in this thread. Suffice it to say that if you are contributor A and you have a task that goes past its deadline. Contributor B has a computer that is always on and has a short queue and is assigned the additional replica that is sent out for the workunit. Contributor A returns the result 6 hours after the deadline and the workunit validates and the canonical result is archived. Contributor B's computer has been working on the result for 6 hours contacts the server to get more work and finds out that its result is no longer required. If the result is aborted and returned as error (aborts are considered errors), then Contributor B gets no credit for the work they did. This is unfair and so the rules above were implemented.

Could something else be implemented? Yes. However, the situation above is not common. Most results that miss their deadline are not ever returned (or are returned long after workunit has been deleted from the system which means that it is over 1 week late). It is much more useful to spend the time working on doing what can be done to ensure that computers can return their results on time. We are working towards providing workunits of different sizes that will be sent to computers that are able to complete them within the allocated time frame. This will do much to avoid the problem in the first place.

[May 26, 2009 2:08:23 PM]

[ ]