Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 39
Posts: 39   Pages: 4   [ Previous Page | 1 2 3 4 ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 6741 times and has 38 replies Next Thread
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Long-running HPF2 task

Shaggy? Is that not a form of a hand rolled, talking cigarette with additive? Double Dutch wink laughing
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[May 21, 2007 1:18:07 PM]   Link   Report threatening or abusive post: please login first  Go to top 
E. Frijters
Senior Cruncher
The Netherlands
Joined: Apr 26, 2007
Post Count: 228
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Long-running HPF2 task

inderdaad.... grapjas! laughing laughing laughing
----------------------------------------
Former grid.org slave


[May 21, 2007 4:49:04 PM]   Link   Report threatening or abusive post: please login first  Go to top 
E. Frijters
Senior Cruncher
The Netherlands
Joined: Apr 26, 2007
Post Count: 228
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Long-running HPF2 task

By the way: this morning the WU was still stuck at the same percentage...

I now killed it... devilish
----------------------------------------
Former grid.org slave


[May 22, 2007 5:09:39 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Long-running HPF2 task

Hi E. Frijters,
Killing it was a good idea. We have multiple threads on this topic, which can get confusing. olympic posted just once in this thread, back in February. Since then he has discovered that HPF2 can get 'stuck' but that starting it again from the last check point fixes the problem. He runs BOINC, so his fix consists of suspending the entire WCG project for several minutes in the PROJECTS tab, then unsuspending it. The HPF2 program picks up at the last check point and works to completion.

Lawrence
[May 22, 2007 12:45:03 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Long-running HPF2 task

funny.... I and a few others corroborated that 'olympic's method. Why it did not do it for E. Frijters obscures me, also because I remember a post from one of techs that looked at the issue and why it would not come out of the loop voluntarily. That was 4-6 moons ago

Oh well (Procol Harum ?)
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[May 22, 2007 3:20:26 PM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Long-running HPF2 task

dseto wrote:

Sekerob

Can you tell me what "Minimum Quorum : 15" means on the work unit validation sheet?

As I indicated I thought that only 15 valid replies were required, and if someone had a long running work unit and it looked like 15 valid replies were soon to be reached that he or she could consider abort.

From your recent reply you indicated all 19 valid replies are needed and new copies would be sent until the 19 is reached.

If you review my copy of a valid work unit I have below it appears that in this case 17 Valid, 2 no reply and 1 error was enough as no new copies have been sent to reach 19 valid replies, am I missing something here??

Could you please explain why NO additional copies were sent out to reach 19 valid results in this case??

I need to defer to knreed / technicians et al on why no additional copies are send out until the 19 valid are filled. At the start of the HPF2 project on BOINC, 1st validation was attempted initially after 11 returns - see here - for the announcement. When it says that all 19 are needed, it implies all 19 valid. Seen on own work that for each 'error' a new copy is distributed, but never paid attention to the 'no reply' situation on HPF2.... Never seen a 'no reply' before on HPF2, so maybe a 'bug', like one discovered on Sunday related to the 'inconclusive' (separate thread for HPF2).

Some background and information on how HPF2 works on BOINC.

1) The goal of HPF2 is to generate about 33,000 structures predictions per gene. These predictions are then further analyzed by the researchers to identify the mostly likely structures for the protein. See the official description of the project for more info: http://homepages.nyu.edu/~rb133/Human_Proteome_Folding_Project.html

2) Each result returned by a computer contains between 10-35 predictions depending on a rough estimate of how tough the predictions will be to compute for the gene.

So on a given gene if we ask each computer to compute 25 structures, then it will take 1,320 results in order to generate the set of 33,000 structures. We could create one workunit for the gene and send it out enough times to make sure we get 1,320 results. However, with the way that BOINC works and this would create some inefficiencies in the system.

What we do is that we create a set of workunits that will generate the required number of results for the gene. Over the entire set of workunits for the gene, we need to generate an average of about 17.2 valid results per workunit. In this example where 25 results are being generated per result, this means that we will create 77 workunits for the gene - each of which needs to average 17.2 valid results with 25 structures per result. 77*17.2*25 = 33,110 structure predictions.

If we have a gene that is tougher and only 10 structures will be generated per result then we would create 192 workunits. 192*17.2*10 = 33,024 structure predictions.

In order to generate the average of 17.2 valid results per structure we only have three variables to modify:

1) Initial replicas sent out (initial_replicas)
2) The min quorum (min_quorum)
3) The min results in agreement within a quorum before accepting the result as valid (min_agreement)

The way that BOINC works is that for each workunit, BOINC will initially send out 'initial_replicas' initial copies of the workunit to be processed. As the results come back in, BOINC will wait until 'min_quorum' is reached before attempting to validate the workunit. During this phase, if a result is aborted or returned as an error, then an additional copy will be sent out. Once 'min_quorum' results are returned then validation is attempted. Validation is successful if at least 'min_agreement' results are determined to be valid. If there are not at least 'min_agreement' valid results, then all the results are marked as 'INCONCLUSIVE' and an additional result is sent out. Each time an additional result is returned, validation is attempted again. Once we have min_agreement valid results then the results will be marked as valid or invalid as appropriate and credit is awarded. Normally at this point BOINC would go ahead and 'assmilate' the result (this means to copy the results off to be returned to the researchers). However, we have modified the assmilator for HPF2 so that it will wait until the last result is returned or the result with the latest deadline has missed its deadline. Then the assimilator will collect all valid results returned to return to the scientists. Also - once the validation has been successful, no additional results will be sent out even if additional errors or invalid results are returned.

We currently have initial_replicas set to 19, min_quorum set to 15 and min_agreement at 13. This is yielding about 17.6 valid results per workunit. We want to keep it a somewhat above the 17.2 target to ensure that we have the required set of structure predictions.
----------------------------------------
[Edit 1 times, last edit by knreed at May 22, 2007 5:14:18 PM]
[May 22, 2007 5:11:41 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Long-running HPF2 task

thanks for the info.
[May 23, 2007 4:46:02 AM]   Link   Report threatening or abusive post: please login first  Go to top 
olympic
Senior Cruncher
Joined: Jun 12, 2005
Post Count: 156
Status: Offline
Reply to this Post  Reply with Quote 
Re: Long-running HPF2 task

Suspending and resuming WCG in the projects tab has worked for me so far to "unstick" HPF2's. You don't have to suspend it for long, I give it about 30 seconds.

It's a very strange anomaly indeed. It's completely random and other members of the quorum seem to complete the WU without issue. It's also rather rare. My machines process ~150 WU/day and sometimes they go many days without issue. 1 in 500-1000 WU would be my rough estimate of the "problem" WU's.
----------------------------------------

[May 23, 2007 6:14:04 AM]   Link   Report threatening or abusive post: please login first  Go to top 
E. Frijters
Senior Cruncher
The Netherlands
Joined: Apr 26, 2007
Post Count: 228
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Long-running HPF2 task

So, WU seldomly crash... I'll keep you posted if it happens again...

Thanks everybody for the support and info!! smile
----------------------------------------
Former grid.org slave


[May 23, 2007 3:10:51 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 39   Pages: 4   [ Previous Page | 1 2 3 4 ]
[ Jump to Last Post ]
Post new Thread