Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 3593
Posts: 3593   Pages: 360   [ Previous Page | 23 24 25 26 27 28 29 30 31 32 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 5824070 times and has 3592 replies Next Thread
Speedy51
Veteran Cruncher
New Zealand
Joined: Nov 4, 2005
Post Count: 1326
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Update: The logging in the script is not helpful. (I wrote it...) I'm adding more logging in places that it appears to have failed. I should be able to get to the bottom of it quickly I hope.

Thanks,
-Uplinger

Thanks for the update Keith. Don't be hard on yourself about the script logging :)
----------------------------------------

[Nov 25, 2019 10:33:10 PM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Ok, figured out the issue. It will require a code change to help prevent it in the future.

So, as many of you are aware, we are limiting the number of results per hour that are sent out. The check on how many workunits to load is showing -183 at the moment. This is the expected behavior of the application at the moment, this means load -183 workunits...Thus the application is not loading anything new. It gets this number because it is checking the number of workunits running on the grid and waiting to be sent. We have about 200 results waiting for reliable hosts, but don't have hosts pulling those off the feeder. This is tough, because it is running slow.

I'm going to release these work units in question to regular hosts to get us over this bump and work towards a permanent fix in the morning. It will require extra thinking because of the complexity of the estimator.

Anyways, I know that probably was the long version of, "I know the issue, I'll fix it later. For now, I'll put a bandaid on it."

Thanks,
-Uplinger
[Nov 25, 2019 11:30:28 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12594
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Thanks, Keith
[Nov 26, 2019 12:54:25 AM]   Link   Report threatening or abusive post: please login first  Go to top 
RTorpey
Advanced Cruncher
Joined: Aug 24, 2005
Post Count: 67
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Just had one WU show up!
[Nov 26, 2019 1:06:19 AM]   Link   Report threatening or abusive post: please login first  Go to top 
TonyEllis
Senior Cruncher
Australia
Joined: Jul 9, 2008
Post Count: 286
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Received a few re-sends (-2) and a few new work (-0) WUs
----------------------------------------
[Nov 26, 2019 3:25:12 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Work Available

Keith,

Your band-aid seems to be holding well. A WU I had waiting for a wingman has been sent out, and I received a couple of new WUs. Thank-you.

I've been thinking about your statement that:
We have about 200 results waiting for reliable hosts, but don't have hosts pulling those off the feeder.

I have to say that I don't understand this. To be honest, I can't remember the exact definition of "reliable", but it surprises me that there aren't enough machines which have acquired that status asking for WUs. Is it just too early in the project, or has something gone wrong with the process which decides on that status?

Or maybe you just need a fall-back plan for this situation which foregoes the reliable status check when the queue backs up too much?

I'm sure you've got better things to do than to indulge me with an answer, but I'm still curious. Maybe someone with more knowledge than me would like to speculate?
[Nov 26, 2019 10:45:40 AM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2346
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Apis,
I've been thinking about your statement that:
We have about 200 results waiting for reliable hosts, but don't have hosts pulling those off the feeder.

I have to say that I don't understand this.
My speculation is that those 200 results are resends (_2, _3, _4).
it surprises me that there aren't enough machines which have acquired that status asking for WUs.
Again my speculation: there are enough (reliable) hosts, but something went wrong with the script that distributes the tasks.
[Nov 26, 2019 11:10:09 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Work Available

It's that 48 hour return thingie, one of the criteria for a machine to be reliable for a project. If you buffer these, bound to be not in. Got app_config set at 1 and profile at 2. One waiting with 29 hours runtime keeps me out. Got a _2 this time though with the full 7 day deadline. Think that was forced this round... some have reported not making even the 35%, 2 days 10 hours even when started immediately.
----------------------------------------
[Edit 1 times, last edit by Former Member at Nov 26, 2019 11:19:04 AM]
[Nov 26, 2019 11:18:29 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12594
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

The definition of reliable seems much too tight. I have abandoned use of my laptop for arp because each unit was taking more than 48 hours crunching and it was only on 50% of time so taking more than 4 days to return. That had only 1 error which was due to too many restarts.

My PC is an i7-3770 and that has been taking 27 hours each, restricted to a maximum of 4 arp running and 12 waiting. That has not had any errors, but the 4.5 day turnaround would classify it as unreliable.

Mike
[Nov 26, 2019 11:47:56 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Work Available

I think what gets me is that stuck resends can completely barf the feeder. I think there should be two queues, one of new work and one of resends. If a machine isn't suitable for a resend, then it should get a WU from the 'new' queue. If the 'new' queue is empty, then the 'committed to other platforms' message would become 'no new work'. If the resend queue gets too full, then the situation should be flagged to the admins -- we don't need to know.

Just my cogs turning over again. They need oiling.
[Nov 26, 2019 11:52:14 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 3593   Pages: 360   [ Previous Page | 23 24 25 26 27 28 29 30 31 32 | Next Page ]
[ Jump to Last Post ]
Post new Thread