World Community Grid - View Thread

World Community Grid Forums

Category: Active Research

Forum: Africa Rainfall Project

Thread: Work Available

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 3593

[ ]

Author

This topic has been viewed 5824070 times and has 3592 replies

Speedy51
Veteran Cruncher
New Zealand
Joined: Nov 4, 2005
Post Count: 1326
Status: Offline
Project Badges:

180 day badge for Human Proteome Folding - Phase 2

14 day badge for Nutritious Rice for the World

2 year badge for Help Fight Childhood Cancer

2 year badge for Help Cure Muscular Dystrophy - Phase 2

14 day badge for Discovering Dengue Drugs - Together - Phase 2

180 day badge for The Clean Energy Project - Phase 2

1 year badge for Computing for Clean Water

90 day badge for Drug Search for Leishmaniasis

90 day badge for GO Fight Against Malaria

45 day badge for Computing for Sustainable Water

10 year badge for Mapping Cancer Markers

2 year badge for Uncovering Genome Mysteries

5 year badge for Outsmart Ebola Together

2 year badge for FightAIDS@Home - Phase 2

2 year badge for Microbiome Immunity Project

2 year badge for Africa Rainfall Project

5 year badge for OpenPandemics - COVID-19


Re: Work Available

Update: The logging in the script is not helpful. (I wrote it...) I'm adding more logging in places that it appears to have failed. I should be able to get to the bottom of it quickly I hope.

Thanks,
-Uplinger

Thanks for the update Keith. Don't be hard on yourself about the script logging :)

----------------------------------------

[Nov 25, 2019 10:33:10 PM]

uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:

10 year badge for Human Proteome Folding

2 year badge for Human Proteome Folding - Phase 2

45 day badge for Help Cure Muscular Dystrophy

2 year badge for Discovering Dengue Drugs - Together

20 year badge for Nutritious Rice for the World

2 year badge for The Clean Energy Project

5 year badge for Help Fight Childhood Cancer

2 year badge for Influenza Antiviral Drug Search

2 year badge for Discovering Dengue Drugs - Together - Phase 2

10 year badge for The Clean Energy Project - Phase 2

5 year badge for Computing for Clean Water

10 year badge for Drug Search for Leishmaniasis

20 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

50 year badge for Mapping Cancer Markers

50 year badge for Uncovering Genome Mysteries

20 year badge for Outsmart Ebola Together

100 year badge for FightAIDS@Home - Phase 2

20 year badge for Smash Childhood Cancer

50 year badge for Microbiome Immunity Project

10 year badge for Africa Rainfall Project

50 year badge for OpenPandemics - COVID-19


Re: Work Available

Ok, figured out the issue. It will require a code change to help prevent it in the future.

So, as many of you are aware, we are limiting the number of results per hour that are sent out. The check on how many workunits to load is showing -183 at the moment. This is the expected behavior of the application at the moment, this means load -183 workunits...Thus the application is not loading anything new. It gets this number because it is checking the number of workunits running on the grid and waiting to be sent. We have about 200 results waiting for reliable hosts, but don't have hosts pulling those off the feeder. This is tough, because it is running slow.

I'm going to release these work units in question to regular hosts to get us over this bump and work towards a permanent fix in the morning. It will require extra thinking because of the complexity of the estimator.

Anyways, I know that probably was the long version of, "I know the issue, I'll fix it later. For now, I'll put a bandaid on it."

Thanks,
-Uplinger

[Nov 25, 2019 11:30:28 PM]

Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12594
Status: Offline
Project Badges:

1 year badge for Human Proteome Folding - Phase 2

45 day badge for Discovering Dengue Drugs - Together

180 day badge for Help Fight Childhood Cancer

90 day badge for Help Cure Muscular Dystrophy - Phase 2

5 year badge for The Clean Energy Project - Phase 2

90 day badge for Computing for Clean Water

1 year badge for Drug Search for Leishmaniasis

180 day badge for GO Fight Against Malaria

5 year badge for Uncovering Genome Mysteries

5 year badge for FightAIDS@Home - Phase 2

10 year badge for OpenPandemics - COVID-19


Re: Work Available

Thanks, Keith

[Nov 26, 2019 12:54:25 AM]

RTorpey
Advanced Cruncher
Joined: Aug 24, 2005
Post Count: 67
Status: Offline
Project Badges:

10 year badge for Human Proteome Folding - Phase 2

2 year badge for Help Cure Muscular Dystrophy

1 year badge for Nutritious Rice for the World

14 day badge for The Clean Energy Project

45 day badge for Influenza Antiviral Drug Search

2 year badge for Drug Search for Leishmaniasis

1 year badge for GO Fight Against Malaria

14 day badge for Computing for Sustainable Water

10 year badge for Uncovering Genome Mysteries

100 year badge for Outsmart Ebola Together

20 year badge for FightAIDS@Home - Phase 2

100 year badge for Microbiome Immunity Project

20 year badge for Africa Rainfall Project

100 year badge for OpenPandemics - COVID-19


Re: Work Available

Just had one WU show up!

[Nov 26, 2019 1:06:19 AM]

TonyEllis
Senior Cruncher
Australia
Joined: Jul 9, 2008
Post Count: 286
Status: Offline
Project Badges:

90 day badge for Nutritious Rice for the World

180 day badge for Discovering Dengue Drugs - Together - Phase 2

2 year badge for Computing for Clean Water

5 year badge for GO Fight Against Malaria

10 year badge for Outsmart Ebola Together

20 year badge for Microbiome Immunity Project


Re: Work Available

Received a few re-sends (-2) and a few new work (-0) WUs

----------------------------------------

Run Time Stats https://grassmere-productions.no-ip.biz/

[Nov 26, 2019 3:25:12 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Work Available

Keith,

Your band-aid seems to be holding well. A WU I had waiting for a wingman has been sent out, and I received a couple of new WUs. Thank-you.

I've been thinking about your statement that:

We have about 200 results waiting for reliable hosts, but don't have hosts pulling those off the feeder.

I have to say that I don't understand this. To be honest, I can't remember the exact definition of "reliable", but it surprises me that there aren't enough machines which have acquired that status asking for WUs. Is it just too early in the project, or has something gone wrong with the process which decides on that status?

Or maybe you just need a fall-back plan for this situation which foregoes the reliable status check when the queue backs up too much?

I'm sure you've got better things to do than to indulge me with an answer, but I'm still curious. Maybe someone with more knowledge than me would like to speculate?

[Nov 26, 2019 10:45:40 AM]

adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2346
Status: Offline
Project Badges:

5 year badge for Human Proteome Folding - Phase 2

100 year badge for Mapping Cancer Markers

1 year badge for Uncovering Genome Mysteries

5 year badge for Microbiome Immunity Project


Re: Work Available

Apis,

I've been thinking about your statement that:

We have about 200 results waiting for reliable hosts, but don't have hosts pulling those off the feeder.

I have to say that I don't understand this.

My speculation is that those 200 results are resends (_2, _3, _4).

it surprises me that there aren't enough machines which have acquired that status asking for WUs.

Again my speculation: there are enough (reliable) hosts, but something went wrong with the script that distributes the tasks.

[Nov 26, 2019 11:10:09 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Work Available

It's that 48 hour return thingie, one of the criteria for a machine to be reliable for a project. If you buffer these, bound to be not in. Got app_config set at 1 and profile at 2. One waiting with 29 hours runtime keeps me out. Got a _2 this time though with the full 7 day deadline. Think that was forced this round... some have reported not making even the 35%, 2 days 10 hours even when started immediately.

----------------------------------------
[Edit 1 times, last edit by Former Member at Nov 26, 2019 11:19:04 AM]

[Nov 26, 2019 11:18:29 AM]

Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12594
Status: Offline
Project Badges:


Re: Work Available

The definition of reliable seems much too tight. I have abandoned use of my laptop for arp because each unit was taking more than 48 hours crunching and it was only on 50% of time so taking more than 4 days to return. That had only 1 error which was due to too many restarts.

My PC is an i7-3770 and that has been taking 27 hours each, restricted to a maximum of 4 arp running and 12 waiting. That has not had any errors, but the 4.5 day turnaround would classify it as unreliable.

Mike

[Nov 26, 2019 11:47:56 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Work Available

I think what gets me is that stuck resends can completely barf the feeder. I think there should be two queues, one of new work and one of resends. If a machine isn't suitable for a resend, then it should get a WU from the 'new' queue. If the 'new' queue is empty, then the 'committed to other platforms' message would become 'no new work'. If the resend queue gets too full, then the situation should be flagged to the admins -- we don't need to know.

Just my cogs turning over again. They need oiling.

[Nov 26, 2019 11:52:14 AM]

[ ]