World Community Grid - View Thread - Long computing times with short Due date?

World Community Grid Forums

Category: Completed Research

Forum: FightAIDS@Home Phase 2

Thread: Long computing times with short Due date?

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 98

[ ]

Author

This topic has been viewed 23310 times and has 97 replies

RTorpey
Advanced Cruncher
Joined: Aug 24, 2005
Post Count: 67
Status: Offline
Project Badges:

10 year badge for Human Proteome Folding - Phase 2

2 year badge for Help Cure Muscular Dystrophy

45 day badge for Discovering Dengue Drugs - Together

1 year badge for Nutritious Rice for the World

14 day badge for The Clean Energy Project

2 year badge for Help Fight Childhood Cancer

45 day badge for Influenza Antiviral Drug Search

2 year badge for Help Cure Muscular Dystrophy - Phase 2

5 year badge for The Clean Energy Project - Phase 2

1 year badge for Computing for Clean Water

2 year badge for Drug Search for Leishmaniasis

1 year badge for GO Fight Against Malaria

14 day badge for Computing for Sustainable Water

50 year badge for Mapping Cancer Markers

10 year badge for Uncovering Genome Mysteries

100 year badge for Outsmart Ebola Together

20 year badge for FightAIDS@Home - Phase 2

20 year badge for Smash Childhood Cancer

100 year badge for Microbiome Immunity Project

20 year badge for Africa Rainfall Project

100 year badge for OpenPandemics - COVID-19


Re: Long computing times with short Due date?

But that doesn't acknowledge the impact this has on other projects. If it's always running at high priority, it pushes every other project to the back of the line. It's great that FAAH2 runs well, but what about people who participate in more than one project? The other projects now suffer because FAAH2 can't forecast their work properly.

[Oct 6, 2015 1:59:41 AM]

deltavee
Ace Cruncher
Texas Hill Country
Joined: Nov 17, 2004
Post Count: 4894
Status: Offline
Project Badges:

5 year badge for Human Proteome Folding - Phase 2

180 day badge for Discovering Dengue Drugs - Together

90 day badge for Nutritious Rice for the World

10 year badge for Help Fight Childhood Cancer

14 day badge for Influenza Antiviral Drug Search

5 year badge for Help Cure Muscular Dystrophy - Phase 2

2 year badge for Discovering Dengue Drugs - Together - Phase 2

100 year badge for The Clean Energy Project - Phase 2

10 year badge for Computing for Clean Water

10 year badge for Drug Search for Leishmaniasis

10 year badge for GO Fight Against Malaria

5 year badge for Computing for Sustainable Water

200 year badge for Mapping Cancer Markers

100 year badge for Uncovering Genome Mysteries

200 year badge for Outsmart Ebola Together

200 year badge for FightAIDS@Home - Phase 2

200 year badge for Smash Childhood Cancer

200 year badge for Microbiome Immunity Project

200 year badge for Africa Rainfall Project

200 year badge for OpenPandemics - COVID-19


Re: Long computing times with short Due date?

Why should it run at high priority? I've been running this with OET1 and haven't gone to high priority yet. It's just a matter of not keeping the cache too large.

[Oct 6, 2015 2:44:57 AM]

pvh513
Senior Cruncher
Joined: Feb 26, 2011
Post Count: 260
Status: Offline
Project Badges:

14 day badge for Discovering Dengue Drugs - Together - Phase 2

20 year badge for The Clean Energy Project - Phase 2

2 year badge for Computing for Clean Water

5 year badge for Drug Search for Leishmaniasis

5 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

20 year badge for Uncovering Genome Mysteries

50 year badge for Outsmart Ebola Together

50 year badge for FightAIDS@Home - Phase 2

100 year badge for Smash Childhood Cancer

100 year badge for Africa Rainfall Project


Re: Long computing times with short Due date?

I now got this WU

FAH2_ avx17287-ls_ 000085_ 0014_ 001_ 1--  	In Progress 	10/5/15 17:59:07 	10/7/15 03:35:06 	9.03 / 0.00 	77.7 / 0.0

Note the very short difference between sent time and return time: less than 34 hours! With this kind of return time, the job immediately goes into high priority mode when it is received, regardless of your queue settings. OK, this kind of WU seems the exception, but it is jumping the queue...

[Oct 6, 2015 4:45:08 AM]

KLiK
Master Cruncher
Croatia
Joined: Nov 13, 2006
Post Count: 3108
Status: Offline
Project Badges:

2 year badge for Human Proteome Folding - Phase 2

90 day badge for Help Cure Muscular Dystrophy

90 day badge for Discovering Dengue Drugs - Together

90 day badge for The Clean Energy Project

90 day badge for Influenza Antiviral Drug Search

2 year badge for The Clean Energy Project - Phase 2

1 year badge for Drug Search for Leishmaniasis

100 year badge for Mapping Cancer Markers

10 year badge for Outsmart Ebola Together

5 year badge for FightAIDS@Home - Phase 2

10 year badge for Smash Childhood Cancer

10 year badge for Microbiome Immunity Project

2 year badge for Africa Rainfall Project

50 year badge for OpenPandemics - COVID-19


Re: Long computing times with short Due date?

Fact: At WCG the client v7 does -NOT- learn about estimated run-times i.e. will not adjust them based on client specific actual runtimes. They are fully controlled and adjusted by the WCG scheduler system with each new assignment, based on server current validated runtime averages.

deltavee has his explanation right on the button. It's irrelevant whether you complete part or whole of an assignment. If not completing a percent or whole by the deadline minus N hours, the uncompleted part is packaged into a follow-on task and the slow boat machine gets a cut-off instruction. This ensures the pace of progression from step 1 to step 3 million [or however many the scientists decide on], stays on track. That track is currently a theoretical -maximum- of ~120 days long to get to step 3 million. Practically/Statistically it will likely be sooner as when my host receives and returns 100K steps within 24 hours [which it does], this gains 3 days on the timeline. If then followed by a straggler that does not do anything by say day 4, the sequence at that point in time is still on schedule.

making a 8d worth of WUs with a 5d limit - will get all of us on cut-off! that is the main problem for me now...
cool

----------------------------------------

oldies:UDgrid.org & PS3 Life@home

non-profit org. Play4Life in Zagreb, Croatia

[Oct 6, 2015 7:53:35 AM]

deltavee
Ace Cruncher
Texas Hill Country
Joined: Nov 17, 2004
Post Count: 4894
Status: Offline
Project Badges:


Re: Long computing times with short Due date?

I now got this WU

FAH2_ avx17287-ls_ 000085_ 0014_ 001_ 1--  	In Progress 	10/5/15 17:59:07 	10/7/15 03:35:06 	9.03 / 0.00 	77.7 / 0.0

And you should only get a workunit like this if you are a reliable computer and have been returning your workunits on time.

[Oct 6, 2015 9:54:55 AM]

SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline


Re: Long computing times with short Due date?

PLUS!, eventually if a project is overworked, the client stops fetching jobs from that project and another will get it's chance to catch up.

Anyone who runs a buffer that is less than half of the shortest deadline project i.e. FAHB standard deadline 4 / 2 = < 2 will hardly see any HP processing. Those with a 'reliable' host will by definition be running a buffer under 2 days, as else these would not receive the repair jobs, and these fast returners anyway hardly care for jobs jumping the queue. The client manages this quite well, except when micro-managers continue to interfere with the FIFO/EDF scheduling.

[Oct 6, 2015 10:34:54 AM]

nanoprobe
Master Cruncher
Classified
Joined: Aug 29, 2008
Post Count: 2998
Status: Offline
Project Badges:

20 year badge for Computing for Clean Water

20 year badge for Mapping Cancer Markers

10 year badge for FightAIDS@Home - Phase 2

20 year badge for Microbiome Immunity Project

10 year badge for Africa Rainfall Project

20 year badge for OpenPandemics - COVID-19


Re: Long computing times with short Due date?

This appears not to be the case in every instance. I have a fast machine that was set with a 1/2 day cache. It downloaded 120 tasks 2 days ago (10/4) with a 4 day deadline. Every task has been running @ high priority since 10/4. I'm guessing that is the case because the client realized there is no way that machine will finish 120 tasks in 4 days. FWIW I had the same issue on a second machine with a 1/2 day cache that downloaded over 100 tasks on 10/1 with a 4 day deadline. They also all ran on high priority until the deadline at which time those that hadn't started went to no reply status. I'm sure the same thing will happen again to the tasks that are due 10/8. Why I was sent so many tasks at 1 time with such a small cache setting needs to be addressed. Makes those machines look unreliable.

----------------------------------------

In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.

----------------------------------------
[Edit 3 times, last edit by nanoprobe at Oct 6, 2015 12:43:53 PM]

[Oct 6, 2015 12:34:21 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Long computing times with short Due date?

I'm seeing the same thing as nanoprobe with my 1 day cache and have several FAAH2 WUs running high priorty. Running a mix of OET1 and FAAH2. The OET1 units have some variability to them and the client doesn't respond quickly to the changes in run times. Some FAAH2 sit in the queue for a day or more because the client download as much as 32 hours of work for a 24 hour cache. Then the FAAH2 units run as long as 31 or 32 hours on the Linux machines. I haven't seen the extreme download #s that nanoprobe has but that may be due to my mix of work.

[Oct 6, 2015 1:17:50 PM]

KLiK
Master Cruncher
Croatia
Joined: Nov 13, 2006
Post Count: 3108
Status: Offline
Project Badges:


Re: Long computing times with short Due date?

I now got this WU

FAH2_ avx17287-ls_ 000085_ 0014_ 001_ 1--  	In Progress 	10/5/15 17:59:07 	10/7/15 03:35:06 	9.03 / 0.00 	77.7 / 0.0

And you should only get a workunit like this if you are a reliable computer and have been returning your workunits on time.

that's d main problem...most of our "devices" aren't reliable anymore!

why?
too much WUs give with short completion times!
cool

----------------------------------------

oldies:UDgrid.org & PS3 Life@home

non-profit org. Play4Life in Zagreb, Croatia

[Oct 6, 2015 1:24:14 PM]

SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline


Re: Long computing times with short Due date?

Getting 100/120+ on a half day cache is a server scheduler screw-up **, and yes if the buffer is over half deadline of all tasks [sum of the TTCs, then all tasks will run HP. With v7 it could initially try to test different tasks to see if the real time is less, but it should stop trying when the pre-empted count has reached the number of active cores.

** We've seen more of these reports and seen it myself how from one task to the next the TTC drops like crazy and the next it doubles / triples and more. A flaw in the server scheduler logic, since longer. The coq who's been dabbling the beak in the vin. chicken

Don't know if client side a fetch can be capped, the 2:01 minutes standard deferred giving the client time to recompute the total buffer, but with these type of run-times not advisable to send a boatload. Up to WCG to fix this... e.g. give no more than total active threads, or idle devices, then back-off to whir the buffer wheels.

TTCs estimates are maintained by-app, so it would be really screwy if one affects the other. Am on 7.6.3 on one and 7.6.9 on the other that does FAHB, and have been spared so far [or get the non available for...]

----------------------------------------
[Edit 1 times, last edit by SekeRob* at Oct 6, 2015 2:28:50 PM]

[Oct 6, 2015 2:24:13 PM]

[ ]