Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 20
Posts: 20   Pages: 2   [ Previous Page | 1 2 ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 2472 times and has 19 replies Next Thread
Movieman
Veteran Cruncher
Joined: Sep 9, 2006
Post Count: 1042
Status: Offline
Reply to this Post  Reply with Quote 
Re: Late task handling

If a job is put in per-empted and there is enough Swap file space, I'd thought that the paused tasks used RAM memory segment is moved to Swap file. For Windows the default of Swap is about 1.5 times RAM, space galore. At 64 times 35MB (that what HCC is taking in peak on my W7 ATM), that would be 2.2 Gig. With HFCC, taking peak at 165MB, that's 11GB, still within the 16GB you''ve got on that crunching monster.

Setting to check: Is RAM permission during work/idle set to 100% or close to that?

BTW, Murphy was listening in... now got 6 HCC running in HP on the octo, due in 2:17 days, with 1.5 cache MinBuffer (which is aka "connect every..."). No disk trashing/swapping... all operation within the 7GB permitted including the 6 that were preempted

At the time I saw this I was helping out another guy get his CFSW( I think) badge, short WU but LOTS of memory usage. Now back on my account doing a mix of projects and using 5.6 gig of ram memory.
Single 128 gig Crucial M4 SSD on this machine showing 21.5 gig free on the drive.
I still think that SOME sort of limitation should be put into place for the priority WU..How about a max of "1/2 available cores" or threads and that would keep it in check for even the people with limited ram memory..
Ram permission is set to 100% BUT I just checked and set for 5 day cache not 3, just changed it down. That may have been part of the issue I saw.
Who's Murphy? biggrin
And a tease: People still using those "old" Octo machines?
I'm out looking for a pair of the upcoming 15 core IB xeons..30 cores/60 threads,pops right into this board and only 130W vs the E5-2687W 150W.. devilish
Thanks again for the info..Appreciated.
----------------------------------------

[Sep 25, 2012 12:16:26 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7851
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Late task handling

Who's Murphy? biggrin
And a tease: People still using those "old" Octo machines?
I'm out looking for a pair of the upcoming 15 core IB xeons..30 cores/60 threads,pops right into this board and only 130W vs the E5-2687W 150W.. devilish


Murphy is a reference to "Murphy's Law" Whatever can go wrong will go wrong at worst possible time.

Tease: Some of us use the "octo"machines because it is the most efficient cruncher available, at least for me. About $100 US invested in it. Boy do I wish I had $15,000 US to invest in the type of machine you are going to build. You leave me drooling with those specs.

Cheers.
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Sep 25, 2012 12:53:01 PM]   Link   Report threatening or abusive post: please login first  Go to top 
mikey
Veteran Cruncher
Joined: May 10, 2009
Post Count: 826
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Late task handling

Hi all,
Just this past weekend I saw Boinc manager do some really screwey things.
On a dual E5-2687W with 32 threads I had a 3 day work load also set to contact at point1 day intervals.
When I was approaching 3 days left anhd seeing what I thought to be more work than I could do in that timeframe I set BOINC to "no new work" and thought I was covered BUT BOINC makes the decision to do the work out of order in terms of report deadlines and then yesterday I see maybe 80 WU have disappeared.. There is a flaw here and it's twofold, the doing of the work out of order for deadline and then the wildcard of Boinc saeeing your machine as a "trusted machine" for lack of a better word and flooding you with high priority WU which then supends to memory those already being worked on which has the effect of not only messing with what you'd already set up but also putting a HUGE load on the memory in the system. This machine has 16 gig of DDR3-1333 ECC REG and I have seen the memory load at over 15 gig because of this flaw. My suggestion is no more than one high priority WU to any single machine and set BOINC to only do WU based on the time to deadline.
Thanks for reading.


Boinc does NOT always run tasks in return by date order, it runs tasks based on the priority assigned by the format the programmers put in it. As you get closer to the return by date Boinc WILL get more into a 'logical' to us humans order, but that is NOT the normal way Boinc works.

As for flooding you with work, you COULD upgrade the memory to 32 or even 64 gb but that is your choice just as it is to keep the units in memory when suspended. YES there is good reason to do this but not if it is causing other problems. As far as only sending one unit to a 'trusted' machine, that would seem to be hoping that there are TONS of other 'trusted' machines and that may simply not be the case. 'Trusted' machines are usually machines that can be 'trusted' to return a unit in a short amount of time AND not have any problems with the units. YES obviously there is more than one 'trusted' machine here at WCG, we ALL like to think our machines are in that category, but the plain fact is that they are not ALL 'trusted'. Sometimes there are more units needed to be resent than other times, you seem to have gotten a large stack of them. Do you crunch for ALL projects here at WCG, or just some? Obviously if only some that will reduce the amount of units you can potentially get. BUT it may also increase the number of units you get as your pc is in the 'trusted' group and there may not be that many others in those projects.

There are just soooo many variables that simple answers are just not possible, even though we would LOVE to have them. It is kind of like asking someone to point to the ocean, no matter which way they point there is an ocean in that direction! Asking them to point to the CLOSEST ocean results in a much clearer answer, but is that as the crow flies or by following the road? There is a Boinc Mailing List run by the Boinc Programers at Berkeley and specifically by Dr. David Anderson, the original writer of Boinc. It is open to anyone but be aware that Boinc is Dr. Anderson's baby and he does NOT take criticism well. There are over half a dozen programmers helping him keep Boinc updated, but Dr. Anderson is the main man. I have personally always found Senior Programmer Rom Walton to be VERY open and helpful, but even he sometimes does what he is told as he is not his own boss. Boinc was first created, for the public, in 1999 and it has gone thru MANY changes, for MANY different reasons, to get to where it is today. The mailing list can be helpful in seeing what is going on and in also seeing some of the why things are the way they are.
----------------------------------------


[Sep 25, 2012 1:35:10 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Late task handling

Hi all,
Just this past weekend I saw Boinc manager do some really screwey things.
On a dual E5-2687W with 32 threads I had a 3 day work load also set to contact at point1 day intervals.
When I was approaching 3 days left anhd seeing what I thought to be more work than I could do in that timeframe I set BOINC to "no new work" and thought I was covered BUT BOINC makes the decision to do the work out of order in terms of report deadlines and then yesterday I see maybe 80 WU have disappeared.. There is a flaw here and it's twofold, the doing of the work out of order for deadline and then the wildcard of Boinc saeeing your machine as a "trusted machine" for lack of a better word and flooding you with high priority WU which then supends to memory those already being worked on which has the effect of not only messing with what you'd already set up but also putting a HUGE load on the memory in the system. This machine has 16 gig of DDR3-1333 ECC REG and I have seen the memory load at over 15 gig because of this flaw. My suggestion is no more than one high priority WU to any single machine and set BOINC to only do WU based on the time to deadline.
Thanks for reading.


Boinc does NOT always run tasks in return by date order, it runs tasks based on the priority assigned by the format the programmers put in it. As you get closer to the return by date Boinc WILL get more into a 'logical' to us humans order, but that is NOT the normal way Boinc works.

As for flooding you with work, you COULD upgrade the memory to 32 or even 64 gb but that is your choice just as it is to keep the units in memory when suspended. YES there is good reason to do this but not if it is causing other problems. As far as only sending one unit to a 'trusted' machine, that would seem to be hoping that there are TONS of other 'trusted' machines and that may simply not be the case. 'Trusted' machines are usually machines that can be 'trusted' to return a unit in a short amount of time AND not have any problems with the units. YES obviously there is more than one 'trusted' machine here at WCG, we ALL like to think our machines are in that category, but the plain fact is that they are not ALL 'trusted'. Sometimes there are more units needed to be resent than other times, you seem to have gotten a large stack of them. Do you crunch for ALL projects here at WCG, or just some? Obviously if only some that will reduce the amount of units you can potentially get. BUT it may also increase the number of units you get as your pc is in the 'trusted' group and there may not be that many others in those projects.

There are just soooo many variables that simple answers are just not possible, even though we would LOVE to have them. It is kind of like asking someone to point to the ocean, no matter which way they point there is an ocean in that direction! Asking them to point to the CLOSEST ocean results in a much clearer answer, but is that as the crow flies or by following the road? There is a Boinc Mailing List run by the Boinc Programers at Berkeley and specifically by Dr. David Anderson, the original writer of Boinc. It is open to anyone but be aware that Boinc is Dr. Anderson's baby and he does NOT take criticism well. There are over half a dozen programmers helping him keep Boinc updated, but Dr. Anderson is the main man. I have personally always found Senior Programmer Rom Walton to be VERY open and helpful, but even he sometimes does what he is told as he is not his own boss. Boinc was first created, for the public, in 1999 and it has gone thru MANY changes, for MANY different reasons, to get to where it is today. The mailing list can be helpful in seeing what is going on and in also seeing some of the why things are the way they are.

The bolded by me might need some serious translation for it's lost on me: When *not* in panic state [Earliest Deadline First], a project [WCG as a whole] runs in FIFO order and if more active projects attached in a client in so called round-robin, no matter the deadline. Read http://boinc.berkeley.edu/boinc_papers/sched/paper.pdf for concepts. Priorities can only be influenced by short deadlines, and are kind of totally non-impacting/meaningless if running a zero to small buffer. On the server side, the techs set a feeder priority, which determines when something is pushed ahead of the hopper queue. They use this for repair tasks and to prioritize the A/B/C type of the DDDT2 project. The client is not aware of these priority codes, it merely works off the principle to deliver all assignments before their required deadline in the least chaotic manor [which it will be anyhow to the layman observer].
[Sep 25, 2012 1:48:06 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7851
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Late task handling

For the most part, I have observed BOINC does run in FIFO order, except when a repair task or beta unit is present. At such time it may suspend a running task and start immediately or it may wait until the first of the running tasks is finished and then start. As a rule these tasks always have shorter deadlines. However, I have noticed on my one octo machine running Linux Mint 12 (6.10.58)with a 3 day cache setting some different behavior, which only seems to be occurring as of the last month or so. Because I only upload/download every three days big chunks of the available work units all have the same return date with time only varying by a few minutes. Randomly this machine will suspend all eight running units at various stages of completion and start other units with the same date/time. These run in high priorty mode. Once it finishes these units it gradually will return to the suspended units and finish them. There is never any danger these units (or the entire cache) will not finish in time. Two things come to mind on why this may occur. The work units run into a few which turn out to run longer than the original estimates so BOINC readjusts upward how long the rest of the awaiting units will be expected to run so it starts some in high priority mode. If these run shorter, it readjusts again to lower the expected run run time of the rest of the cache and goes back to normal mode. The variability of the run time of the work units may be the culprit. The project is HFCC run on this machine with 150 to 200 tasks in the queue.
The second item may be what mickey159b states:
it runs tasks based on the priority assigned by the format the programmers put in it.
There is something embedded in the work unit(s) which specifies a time which BOINC algorithms read which causes the hurry up behavior.
At any rate I just leave it alone and it gets everything done in a timely fashion.
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Sep 25, 2012 2:55:53 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Late task handling

Quote "There is something embedded in the work unit(s) which specifies a time which BOINC algorithms read which causes the hurry up behavior."

Not aware of anything inside the task header that makes a task to be hurried up per se. There's an estimated FPOPS which translates to an estimated run time based on device benchmark and a deadline, but that's about it. Running a 3 day cache and reporting them in big chunks, once every 3 days, to consider that BOINC uses a safety margin for completing, yes, 3 days cache is a sure fire for any 4 day jobs to be treated at some point with priority, even some regulars if the re-connect is in a known future and greater than half of the remaining time to deadline of any "Ready to start". If there is a unexpected long running task in that 3 day cache that suddenly causes the DCF to inflate, the condition will deteriorate further.

That said, and to repeat a base WCG rule, a device that consistently returns tasks later than 2 days from receipt should not be getting any 4 day deadline CPU tasks, with the exception of BETA. If the device gets them nonetheless, something is broken on the server **. They [BETA] go to anyone asking for work which is only fair.

All in all, too many versions in circulation and quirks, so general advise is, to just accept how BOINC does order things. For sure, I'm not going to attempt to know for each version how the nitty gritty logic [NGL] is operating, just the outlines which have been reasonably consistent through the BOINC ages. I'll leave the NGL to the Inglisede's of the crunching world.

** Changing cache from < 2 days to > 2 days requires some time to pass before the server realizes that a client no longer meets one of the "repair" criteria.
----------------------------------------
[Edit 1 times, last edit by Former Member at Sep 25, 2012 4:23:52 PM]
[Sep 25, 2012 3:22:59 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7851
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Late task handling

You are correct in that I have never seen a repair unit on this machine.
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Sep 25, 2012 4:05:18 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Movieman
Veteran Cruncher
Joined: Sep 9, 2006
Post Count: 1042
Status: Offline
Reply to this Post  Reply with Quote 
Re: Late task handling

So much info..lol and all appreciated and Mike, I've been here since 2006, been crunching since 2003 so yes, i am aware of the who's and what's of Boinc but thank you anyway for the history lesson. As to adding memory to the current 16 gig, price DDR3-1333 ECC REG. It's not like buying desktop memory.
The 16 gig I have was almost $700.00 last April when this was built.
Sek, as I mentioned earlier I'd set preferences for FIVE days as I was helping one of the guys on his badge and forgot to set back to my normal 3 days and that in itself may be the cause of what I saw. It's back at 3 days now and sorry for taking everyones time on this and greatly appreciate all the info you guys have bestowed on this old fart.. biggrin
----------------------------------------

[Sep 26, 2012 2:07:39 AM]   Link   Report threatening or abusive post: please login first  Go to top 
mikey
Veteran Cruncher
Joined: May 10, 2009
Post Count: 826
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Late task handling

Hi all,
Just this past weekend I saw Boinc manager do some really screwey things.
On a dual E5-2687W with 32 threads I had a 3 day work load also set to contact at point1 day intervals.
When I was approaching 3 days left anhd seeing what I thought to be more work than I could do in that timeframe I set BOINC to "no new work" and thought I was covered BUT BOINC makes the decision to do the work out of order in terms of report deadlines and then yesterday I see maybe 80 WU have disappeared.. There is a flaw here and it's twofold, the doing of the work out of order for deadline and then the wildcard of Boinc saeeing your machine as a "trusted machine" for lack of a better word and flooding you with high priority WU which then supends to memory those already being worked on which has the effect of not only messing with what you'd already set up but also putting a HUGE load on the memory in the system. This machine has 16 gig of DDR3-1333 ECC REG and I have seen the memory load at over 15 gig because of this flaw. My suggestion is no more than one high priority WU to any single machine and set BOINC to only do WU based on the time to deadline.
Thanks for reading.


Boinc does NOT always run tasks in return by date order, it runs tasks based on the priority assigned by the format the programmers put in it.


The bolded by me might need some serious translation for it's lost on me: When *not* in panic state [Earliest Deadline First], a project [WCG as a whole] runs in FIFO order and if more active projects attached in a client in so called round-robin, no matter the deadline.



THANK YOU....for the life of me I could NOT remember FIFO when typing all that!!! I am getting frickin old!!!!
----------------------------------------


[Sep 26, 2012 8:26:53 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Movieman
Veteran Cruncher
Joined: Sep 9, 2006
Post Count: 1042
Status: Offline
Reply to this Post  Reply with Quote 
Re: Late task handling

All issues sorted and the machine came in at number 10 project wide for the last week with a 138,599 PPD average.
Very happy with those numbers for now.
Next machine should double those numbers if my math is correct. biggrin
----------------------------------------

[Oct 1, 2012 5:44:26 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 20   Pages: 2   [ Previous Page | 1 2 ]
[ Jump to Last Post ]
Post new Thread