Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 98
Posts: 98   Pages: 10   [ Previous Page | 1 2 3 4 5 6 7 8 9 10 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 16472 times and has 97 replies Next Thread
RTorpey
Advanced Cruncher
Joined: Aug 24, 2005
Post Count: 67
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Long computing times with short Due date?

But that doesn't acknowledge the impact this has on other projects. If it's always running at high priority, it pushes every other project to the back of the line. It's great that FAAH2 runs well, but what about people who participate in more than one project? The other projects now suffer because FAAH2 can't forecast their work properly.
[Oct 6, 2015 1:59:41 AM]   Link   Report threatening or abusive post: please login first  Go to top 
deltavee
Ace Cruncher
Texas Hill Country
Joined: Nov 17, 2004
Post Count: 4852
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Long computing times with short Due date?

But that doesn't acknowledge the impact this has on other projects. If it's always running at high priority, it pushes every other project to the back of the line. It's great that FAAH2 runs well, but what about people who participate in more than one project? The other projects now suffer because FAAH2 can't forecast their work properly.

Why should it run at high priority? I've been running this with OET1 and haven't gone to high priority yet. It's just a matter of not keeping the cache too large.
----------------------------------------
4720 Yrs
[Oct 6, 2015 2:44:57 AM]   Link   Report threatening or abusive post: please login first  Go to top 
pvh513
Senior Cruncher
Joined: Feb 26, 2011
Post Count: 260
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Long computing times with short Due date?

I now got this WU
FAH2_ avx17287-ls_ 000085_ 0014_ 001_ 1--  	In Progress 	10/5/15 17:59:07 	10/7/15 03:35:06 	9.03 / 0.00 	77.7 / 0.0

Note the very short difference between sent time and return time: less than 34 hours! With this kind of return time, the job immediately goes into high priority mode when it is received, regardless of your queue settings. OK, this kind of WU seems the exception, but it is jumping the queue...
[Oct 6, 2015 4:45:08 AM]   Link   Report threatening or abusive post: please login first  Go to top 
KLiK
Master Cruncher
Croatia
Joined: Nov 13, 2006
Post Count: 3108
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Long computing times with short Due date?

Fact: At WCG the client v7 does -NOT- learn about estimated run-times i.e. will not adjust them based on client specific actual runtimes. They are fully controlled and adjusted by the WCG scheduler system with each new assignment, based on server current validated runtime averages.

deltavee has his explanation right on the button. It's irrelevant whether you complete part or whole of an assignment. If not completing a percent or whole by the deadline minus N hours, the uncompleted part is packaged into a follow-on task and the slow boat machine gets a cut-off instruction. This ensures the pace of progression from step 1 to step 3 million [or however many the scientists decide on], stays on track. That track is currently a theoretical -maximum- of ~120 days long to get to step 3 million. Practically/Statistically it will likely be sooner as when my host receives and returns 100K steps within 24 hours [which it does], this gains 3 days on the timeline. If then followed by a straggler that does not do anything by say day 4, the sequence at that point in time is still on schedule.

making a 8d worth of WUs with a 5d limit - will get all of us on cut-off! that is the main problem for me now...
cool
----------------------------------------
oldies:UDgrid.org & PS3 Life@home


non-profit org. Play4Life in Zagreb, Croatia
[Oct 6, 2015 7:53:35 AM]   Link   Report threatening or abusive post: please login first  Go to top 
deltavee
Ace Cruncher
Texas Hill Country
Joined: Nov 17, 2004
Post Count: 4852
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Long computing times with short Due date?

I now got this WU
FAH2_ avx17287-ls_ 000085_ 0014_ 001_ 1--  	In Progress 	10/5/15 17:59:07 	10/7/15 03:35:06 	9.03 / 0.00 	77.7 / 0.0

Note the very short difference between sent time and return time: less than 34 hours! With this kind of return time, the job immediately goes into high priority mode when it is received, regardless of your queue settings. OK, this kind of WU seems the exception, but it is jumping the queue...
And you should only get a workunit like this if you are a reliable computer and have been returning your workunits on time.
----------------------------------------
4720 Yrs
[Oct 6, 2015 9:54:55 AM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: Long computing times with short Due date?

But that doesn't acknowledge the impact this has on other projects. If it's always running at high priority, it pushes every other project to the back of the line. It's great that FAAH2 runs well, but what about people who participate in more than one project? The other projects now suffer because FAAH2 can't forecast their work properly.

PLUS!, eventually if a project is overworked, the client stops fetching jobs from that project and another will get it's chance to catch up.

Anyone who runs a buffer that is less than half of the shortest deadline project i.e. FAHB standard deadline 4 / 2 = < 2 will hardly see any HP processing. Those with a 'reliable' host will by definition be running a buffer under 2 days, as else these would not receive the repair jobs, and these fast returners anyway hardly care for jobs jumping the queue. The client manages this quite well, except when micro-managers continue to interfere with the FIFO/EDF scheduling.
[Oct 6, 2015 10:34:54 AM]   Link   Report threatening or abusive post: please login first  Go to top 
nanoprobe
Master Cruncher
Classified
Joined: Aug 29, 2008
Post Count: 2998
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Long computing times with short Due date?


PLUS!, eventually if a project is overworked, the client stops fetching jobs from that project and another will get it's chance to catch up.

Anyone who runs a buffer that is less than half of the shortest deadline project i.e. FAHB standard deadline 4 / 2 = < 2 will hardly see any HP processing. Those with a 'reliable' host will by definition be running a buffer under 2 days, as else these would not receive the repair jobs, and these fast returners anyway hardly care for jobs jumping the queue. The client manages this quite well, except when micro-managers continue to interfere with the FIFO/EDF scheduling.

This appears not to be the case in every instance. I have a fast machine that was set with a 1/2 day cache. It downloaded 120 tasks 2 days ago (10/4) with a 4 day deadline. Every task has been running @ high priority since 10/4. I'm guessing that is the case because the client realized there is no way that machine will finish 120 tasks in 4 days. FWIW I had the same issue on a second machine with a 1/2 day cache that downloaded over 100 tasks on 10/1 with a 4 day deadline. They also all ran on high priority until the deadline at which time those that hadn't started went to no reply status. I'm sure the same thing will happen again to the tasks that are due 10/8. Why I was sent so many tasks at 1 time with such a small cache setting needs to be addressed. Makes those machines look unreliable.
----------------------------------------
In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.


----------------------------------------
[Edit 3 times, last edit by nanoprobe at Oct 6, 2015 12:43:53 PM]
[Oct 6, 2015 12:34:21 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Long computing times with short Due date?

I'm seeing the same thing as nanoprobe with my 1 day cache and have several FAAH2 WUs running high priorty. Running a mix of OET1 and FAAH2. The OET1 units have some variability to them and the client doesn't respond quickly to the changes in run times. Some FAAH2 sit in the queue for a day or more because the client download as much as 32 hours of work for a 24 hour cache. Then the FAAH2 units run as long as 31 or 32 hours on the Linux machines. I haven't seen the extreme download #s that nanoprobe has but that may be due to my mix of work.
[Oct 6, 2015 1:17:50 PM]   Link   Report threatening or abusive post: please login first  Go to top 
KLiK
Master Cruncher
Croatia
Joined: Nov 13, 2006
Post Count: 3108
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Long computing times with short Due date?

I now got this WU
FAH2_ avx17287-ls_ 000085_ 0014_ 001_ 1--  	In Progress 	10/5/15 17:59:07 	10/7/15 03:35:06 	9.03 / 0.00 	77.7 / 0.0

Note the very short difference between sent time and return time: less than 34 hours! With this kind of return time, the job immediately goes into high priority mode when it is received, regardless of your queue settings. OK, this kind of WU seems the exception, but it is jumping the queue...
And you should only get a workunit like this if you are a reliable computer and have been returning your workunits on time.

that's d main problem...most of our "devices" aren't reliable anymore!

why?
too much WUs give with short completion times!
cool
----------------------------------------
oldies:UDgrid.org & PS3 Life@home


non-profit org. Play4Life in Zagreb, Croatia
[Oct 6, 2015 1:24:14 PM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: Long computing times with short Due date?


PLUS!, eventually if a project is overworked, the client stops fetching jobs from that project and another will get it's chance to catch up.

Anyone who runs a buffer that is less than half of the shortest deadline project i.e. FAHB standard deadline 4 / 2 = < 2 will hardly see any HP processing. Those with a 'reliable' host will by definition be running a buffer under 2 days, as else these would not receive the repair jobs, and these fast returners anyway hardly care for jobs jumping the queue. The client manages this quite well, except when micro-managers continue to interfere with the FIFO/EDF scheduling.

This appears not to be the case in every instance. I have a fast machine that was set with a 1/2 day cache. It downloaded 120 tasks 2 days ago (10/4) with a 4 day deadline. Every task has been running @ high priority since 10/4. I'm guessing that is the case because the client realized there is no way that machine will finish 120 tasks in 4 days. FWIW I had the same issue on a second machine with a 1/2 day cache that downloaded over 100 tasks on 10/1 with a 4 day deadline. They also all ran on high priority until the deadline at which time those that hadn't started went to no reply status. I'm sure the same thing will happen again to the tasks that are due 10/8. Why I was sent so many tasks at 1 time with such a small cache setting needs to be addressed. Makes those machines look unreliable.

Getting 100/120+ on a half day cache is a server scheduler screw-up **, and yes if the buffer is over half deadline of all tasks [sum of the TTCs, then all tasks will run HP. With v7 it could initially try to test different tasks to see if the real time is less, but it should stop trying when the pre-empted count has reached the number of active cores.

** We've seen more of these reports and seen it myself how from one task to the next the TTC drops like crazy and the next it doubles / triples and more. A flaw in the server scheduler logic, since longer. The coq who's been dabbling the beak in the vin. chicken

Don't know if client side a fetch can be capped, the 2:01 minutes standard deferred giving the client time to recompute the total buffer, but with these type of run-times not advisable to send a boatload. Up to WCG to fix this... e.g. give no more than total active threads, or idle devices, then back-off to whir the buffer wheels.

TTCs estimates are maintained by-app, so it would be really screwy if one affects the other. Am on 7.6.3 on one and 7.6.9 on the other that does FAHB, and have been spared so far [or get the non available for...]
----------------------------------------
[Edit 1 times, last edit by SekeRob* at Oct 6, 2015 2:28:50 PM]
[Oct 6, 2015 2:24:13 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 98   Pages: 10   [ Previous Page | 1 2 3 4 5 6 7 8 9 10 | Next Page ]
[ Jump to Last Post ]
Post new Thread