Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 70
Posts: 70   Pages: 7   [ Previous Page | 1 2 3 4 5 6 7 ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 9312 times and has 69 replies Next Thread
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: High Load Warning, scheduler request deferred

yup, it looks like the setting I was attempting to use to get around this problem has caused nothing that needed reliable to get sent. I have pulled the setting but it has a pretty large backlog of results needing reliable. I am going to let it run for a bit to see if it can clear on its own, if not I will give it a little bump to help it.

Thanks,
-Uplinger
[Nov 10, 2014 5:16:03 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: High Load Warning, scheduler request deferred

It's quorum 2 *** i.e. is 'reliable' really needed for this science, that is if you don't insist on getting a < 48 hour turn around?

*** iirc you wrote that it will be quorum 2 through the end of the project.

This post landed in the wrong thread. It was meant for 'waiting to send' in the ugm forum.
----------------------------------------
[Edit 2 times, last edit by Former Member at Nov 10, 2014 8:08:12 AM]
[Nov 10, 2014 6:44:34 AM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: High Load Warning, scheduler request deferred

The setting for reliable host is what controls the 40% cut down in the rate. Yes, a reliable host is required, because the chances of them returning the work unit and helping a batch average completion time down is highly desirable. If we removed reliable hosts on a project it goes from an average batch turn around of 12 days to 21 days. Thus meaning 99% of the results are sitting around on our servers waiting for that one last result. Basically if you returned a workunit within 6 hours of it getting issues, and user B just never runs it at all. Do you want someone with a machine that consistently returns valid work on a regular basis to help complete the workunit? In our case, yes.

Hope this helps explain the reasoning behind the setting.

Thanks,
-Uplinger
[Nov 10, 2014 7:06:22 AM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: High Load Warning, scheduler request deferred

It appears that FAHV and CEP2 still have a little backlog on needing reliable hosts. I am going to let them play out their path tonight. If they are still clogged in 7 hours I will help push them along. But since the other 3 applications have cleared their backlog I think FAHV and CEP2 will clear in the next few hours.

Thanks for your patience,
-Uplinger
[Nov 10, 2014 7:59:20 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: High Load Warning, scheduler request deferred

The setting for reliable host is what controls the 40% cut down in the rate. Yes, a reliable host is required, because the chances of them returning the work unit and helping a batch average completion time down is highly desirable. If we removed reliable hosts on a project it goes from an average batch turn around of 12 days to 21 days. Thus meaning 99% of the results are sitting around on our servers waiting for that one last result. Basically if you returned a workunit within 6 hours of it getting issues, and user B just never runs it at all. Do you want someone with a machine that consistently returns valid work on a regular basis to help complete the workunit? In our case, yes.

Hope this helps explain the reasoning behind the setting.

Thanks,
-Uplinger

Sorry, my post landed in the wrong thread. It was targeted for the 'waiting to be sent' discussion in the ugm forum. For the zero redundant you'd want a reliable host, but, does an error not throw a 2 part question as in, was the host at fault or was there something with the task? Either way, have moved way past the 'waiting for wingman'. It's really only your -problem- of maintaining server performance / storage demand and the size of the batches you choose before they are complete for assimilation in my simple view.

Did the '12 days or 21 days' refer to the 7 day deadline projects or the 10? With 7 days due you already have 2 repair cycles of 2.45 days included. What are the chances of these not coming back in their allotted time i.e. i'd think a batch would complete in 12 days for ugm and mcm either way as the agent will prioritize, and these are the 2 with the large result files going back to wcg. When nodes have a buffer of greater than deadline, the server seems not to sent these tasks to those hosts anyhow, reliable rated or not.

Maybe this is not my/the volunteers problem either, but just wanted to give outside feedback. It goes back to your feeder dumping > 47 short deadline fahv onto my tablet when there was no hope in heck to get them all done even when running 24/7, which it doesn't. It has a 7.3% error rate at that, not exactly 'reliable'. In fact in the end deleted 15 of them. Was not going to wait for the server to distribute additional copies at deadline and my tab starting them and duplicating effort in the race of 'who completes and reports first' with the additional wingman after the no replies. Loath that. Really would like the servers simply instructing agents to abort any overdue results and do away with this waste issue. Do not wait for a 'No reply' to still come back and then hope one or the other node connects and receive the 'server abort, no longer needed'. Agents can wait up to 24 hours to connect by themselves to report results i.e. overdue will get crunched till then.
----------------------------------------
[Edit 1 times, last edit by Former Member at Nov 10, 2014 8:51:39 AM]
[Nov 10, 2014 8:45:55 AM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: High Load Warning, scheduler request deferred

I have implemented the new scheduler which should help fix the issue that was causing the problem. Please let me know if you notice any issues with it. Also, it is a configurable option, so I can extend it to new users that have a similar setup.

Thanks,
-Uplinger
[Nov 10, 2014 10:45:14 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: High Load Warning, scheduler request deferred

New scheduler, new switch

World Community Grid 11/10/2014 11:53:10 PM [sched_op] CPU work request: 4447.15 seconds; 0.00 devices
World Community Grid 11/10/2014 11:53:15 PM Scheduler request completed: got 0 new tasks
World Community Grid 11/10/2014 11:53:15 PM Project is temporarily shut down for maintenance
World Community Grid 11/10/2014 11:53:15 PM [sched_op] Deferring communication for 00:27:31
World Community Grid 11/10/2014 11:53:15 PM [sched_op] Reason: project is down
[Nov 10, 2014 11:03:03 PM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: High Load Warning, scheduler request deferred

hahaha, looks like I put the scheduler in place and forgot to turn the feeder back online, thanks for the catch.

Things should be running now

-Uplinger
[Nov 10, 2014 11:11:04 PM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: High Load Warning, scheduler request deferred

FYI, just took a look at the loads on the system, and members may get backoff due to high usage of the system. This is due to a defrag we are doing on one of the large filesystems. It is making the IO slow across the system and should be cleared in the next hour.

Thanks,
-Uplinger
[Nov 10, 2014 11:13:57 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: High Load Warning, scheduler request deferred

When writing 7.3 percent error rate on tablet, was actually speaking of the runtime. When taking pure result count, sitting on 15.99 percent over the last 444 returns for android or about 1/6th. This is only with the latest 7.32 version of application. Since there is no special batch pool for android afaik, there will be significant hold up of the general batch completion for fahv. Something to think about when further developing schedulers and feeder pathways.

Surprising, regardless of the 1/6th, repairs are still being sent by the boatload to the tab, so what exactly is this definition of 'reliable'? Is there a per-platform granulation?

Meantime in hands off,

World Community Grid 11/11/2014 10:37:15 PM Message from server: We are currently experiencing high load and are temporarily deferring your scheduler request. Your client will automatically try again later.
----------------------------------------
[Edit 1 times, last edit by Former Member at Nov 11, 2014 9:46:57 AM]
[Nov 11, 2014 9:45:16 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 70   Pages: 7   [ Previous Page | 1 2 3 4 5 6 7 ]
[ Jump to Last Post ]
Post new Thread