World Community Grid - View Thread - It would be nice if the scheduler thought about efficiency...

World Community Grid Forums

Category: Support

Forum: BOINC Agent Support

Thread: It would be nice if the scheduler thought about efficiency...

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 18

[ ]

Author

This topic has been viewed 2092 times and has 17 replies

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


It would be nice if the scheduler thought about efficiency...

The scheduler has developed a nasty habit of dumping CEP2 jobs on me that abruptly preempt tasks that are already in progress.

Observe the number of tasks "waiting to run" that are partially completed - but more pertinently, observe the running tasks that have almost identical "progress"/completion percentages.

While I understand the need to get some tasks pushed out and returned pronto (an acceptable hazard of being a "reliable" computational source), I surely wish the scheduler was "bright" enough to include the number of configured available cores per machine so as to limit the dumping of jobs with "near real-time" deadline dates to no more than 50% of that machine's available cores.

When higher priority (closer to deadline) jobs come down in bulk preempting all work in progress, then all of the new jobs compete nearly simultaneously for hard drive I/O...and I don't have to state that projects like CEP2 are disk I/O intensive. Having all available cores in deadlock in terms of job start times and so "progress" percentage smashes efficiencies as reads and, more particularly, writes end up competing for I/O.

I run RAID 10/1+0 sets for BOINC data generally...I shudder to think of single drive systems attempting to deal with that amount of simultaneous I/O.

[Feb 2, 2013 9:09:02 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: It would be nice if the scheduler thought about efficiency...

And to add a bit of emphasis to "I see the consequences of the scheduler's thoughtlessness a lot.":

All systems here have a 0.4 day additional work buffer in order to eliminate the possibility that jobs/tasks get "stale"/approach deadline because they're waiting in queue at my end.

[Feb 2, 2013 9:36:14 PM]

Ingleside
Veteran Cruncher
Norway
Joined: Nov 19, 2005
Post Count: 974
Status: Offline
Project Badges:

2 year badge for Human Proteome Folding - Phase 2

180 day badge for Discovering Dengue Drugs - Together

1 year badge for Nutritious Rice for the World

1 year badge for The Clean Energy Project

2 year badge for Help Fight Childhood Cancer

180 day badge for Influenza Antiviral Drug Search

2 year badge for Help Cure Muscular Dystrophy - Phase 2

1 year badge for Discovering Dengue Drugs - Together - Phase 2

2 year badge for The Clean Energy Project - Phase 2

2 year badge for Computing for Clean Water

2 year badge for Drug Search for Leishmaniasis

2 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

20 year badge for Mapping Cancer Markers

5 year badge for Uncovering Genome Mysteries

5 year badge for Outsmart Ebola Together

5 year badge for FightAIDS@Home - Phase 2

10 year badge for Microbiome Immunity Project

5 year badge for Africa Rainfall Project

20 year badge for OpenPandemics - COVID-19


Re: It would be nice if the scheduler thought about efficiency...

You didn't say the main buffer-size, but if BoincTask is anything to go by the #9-computer has over 80 days cached and is an 8-way system... indicating you're running with a 10-day cache-size on a project where the longest deadlines is also 10 days...

----------------------------------------

"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."

[Feb 2, 2013 10:27:35 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: It would be nice if the scheduler thought about efficiency...

There will always be cases where a cruncher would miss matching his/her variables (settings, preferences, BOINC-controls, and the likes) to the requisites to achieve his/her (specific) objectives. If all processes play by the rules, the matter of high-priority WUs should not add any more to the difficulty of maintaining the said matching. Are all processes playing by the rules? What is the rule-set for high-priority WUs? It is apparent -, and there are a number of testimonies -, that high-prority WUs interrupt on-going WU-execution seemingly without regard of the state/conditions of the WUs they are interrupting -- and yes, as if these high-priority WUs do not have rules to follow, else does not seem to be even aware of what impact they are doing to a cruncher's efforts to match variables with objectives.
;
; andzgridPost#866
;

[Feb 3, 2013 1:13:32 AM]

Ingleside
Veteran Cruncher
Norway
Joined: Nov 19, 2005
Post Count: 974
Status: Offline
Project Badges:


Re: It would be nice if the scheduler thought about efficiency...

Are all processes playing by the rules? What is the rule-set for high-priority WUs?

v6.10.xx has never worked correctly, so appart for setting a smaller cache, moving to a different client is highly recommented, especially for anyone taking part in multiple BOINC-projects.

As for the rules for running high priority for non-v6.10.xx-clients, these have changed many times but should now be roughly:
1: If a task has less than "Connected..." until it's deadline, run it HP, in reality it's run Earliest Deadline First.
2: If estimated run-time puts a task past it's deadline, run it HP.
3: If more tasks than #cores has the same deadlines, don't swap between them running HP, but finish the started ones first.
4: GPU-tasks has priority over CPU-tasks, meaning if you've configured to give all cores to GPU, the CPU-tasks won't run even if past their deadline.

v6.10.xx doesn't follow #1 nor #3, and will example choose to run work with 1+ month until deadline even if some other work is past their deadline.

----------------------------------------

"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."

[Feb 3, 2013 2:15:26 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: It would be nice if the scheduler thought about efficiency...

Albeit I stated at the bottom of my second post

All systems here have a 0.4 day additional work buffer

here are the actual preference tabs from system 9 for perusal

All systems are configured 'almost' identically (disk space allotments may vary between 50 GB and 100 GB while I tend to tell 12 GB and higher physical memory systems they can use 90% of memory at any time rather than 75%...rather pointless).

System 9 is an i7-2600K while 7 is an i7-980x (thus perception of actual work in queue must account for the power/speed of the systems...a CEP2 task on system 9 completes in approximately 6.5 hours times 7 or 8 cores dependent upon HCC GPU and core usage) while 6 and 8 are slower (a Q9550 and i7-950, respectively).

Is it possible that CEP2 tasks in queue are getting bumped by the download of "higher priority"/closer to deadline CEP2 tasks downloaded when the far easier/faster HCC GPU tasks complete and send a new tasks request to the scheduler???

[Feb 3, 2013 8:09:37 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: It would be nice if the scheduler thought about efficiency...

v6.10.xx has never worked correctly, so appart for setting a smaller cache, moving to a different client is highly recommented, especially for anyone taking part in multiple BOINC-projects.

Can I install the latest, greatest BOINC (released/recommended) client over an existing install without risking CEP2 computational integrity given that CEP2 still recommends 6.10.58?

If so would the best path be [drain current work by setting "No New Tasks" and then as systems run dry] install latest over top of existing?

If not would the best path be [drain current work by setting "No New Tasks" and then as systems run dry] uninstall current client then install new client setting the data directory to the existing BOINC data directory (which isn't the default anywhere anyway)???

And if [drain current work by setting "No New Tasks" and then as systems run dry] is inappropriate, should I just set "No New Tasks", abort current work, suspend projects...i.e., empty everything out...and then upgrade or uninstall/install new as appropriate?

If this is in a FAQ somewhere, pardon me for not RTFM...even call me lazy, if you'd like - I'd just like to avoid chasing down what is likely to be scattered information.

[Feb 3, 2013 8:20:12 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: It would be nice if the scheduler thought about efficiency...

Development of client scheduling has advanced so much, to include the addition of an on-client ability to restrict the number of concurrent processes of an individual science [app_config.xml], it's really not worth anyone's time in debugging a 6.10.xx issue. Recommend you move your clients to a version 7.0.42 or up. No troubles here with either that or my present 7.0.47 on octo on W7, W8, [7.0.39 on Linux Ubuntu and Fedora.]

Per knreed a very large body of volunteers have already moved to the 7 series [still in internal alpha testing, primarily for code auditing and putting the WCG skin on]. Individuals should not feel restrained in upgrading, it's large organizations that do mass roll-outs and companies that volunteers large device counts that do and is the main reason of the in-depth checking that whatever gets the final seal of approval is going to stand up for a longer period [and does not cause IBM themselves internal issue for tens of thousands of seats... their version goes into their software catalogue].

(The project properties screen may reveal a screwed DCF, but we already know about that from prior reports [I did not comprehend] of TTC's flip-flopping from showing the CPU times and the GPU times [twilyth it was IIRC]).

----------------------------------------
[Edit 1 times, last edit by Former Member at Feb 3, 2013 8:27:45 PM]

[Feb 3, 2013 8:25:39 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: It would be nice if the scheduler thought about efficiency...

. Recommend you move your clients to a version 7.0.42 or up.

Works for me - but what is best: Uninstall 6.10.58 and then install a 7 client, or install a 7 client over the existing 6 client?

I've about decided to abort currently queued and WIP tasks; it may cause a lot of head scratching/(and/or cursing) among the shrinking pool of wingmen my task %_01/02/03/04 identifiers suggest still exist, but...from my point of view, that's easiest as I don't have to monitor from idle systems as tasks run dry.

[Feb 3, 2013 8:44:37 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: It would be nice if the scheduler thought about efficiency...

There's no need to uninstall, install. Just run the 7.0.xx and it takes care of first uninstalling the previous version. Think this is loss-less, which was not the case with the early 7.0.xx clients.

An aside, I've done some over-the-top cache testing up to 10 days with 7.0.45, panic breaking loose, but never seeing more than the number of cores ever to be in waiting to run while High Priority/ EDF state was going. After, set the MinB [Minimum work buffer] to zero and all panic should eventually cease. Just abort the oldest, not yet started work for as much that you know the remaining queued work will complete in time. There's no reason that what has started to be aborted.

BTW, there's definitely no loss-less way back to 6.xx once you've done the 7.0.xx upgrade. AND, there's a draft FAQ explaining old and new buffering. The Additional Buffer has become a type of overflow.... put out a work call and get as much as possible withing cache setting constraints, but if the MaxAB [Maximum Additional Work Buffer] does not get fully stocked, still not not ask for more and wait until MinB has been dropped below it's setting level... this means that if BOINC goes wild, it will only do it one call in your case, since as I understand your MinB [fka Connect about every] is set to zero.

[Feb 3, 2013 9:01:41 PM]

[ ]