Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Completed Research Forum: FightAIDS@Home Phase 2 Thread: 24 Hour Deadline too Short |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 112
|
Author |
|
NixChix
Veteran Cruncher United States Joined: Apr 29, 2007 Post Count: 1187 Status: Offline Project Badges: |
The 24 hour deadline is insane. The jobs are taking longer than 24 hours on some of my computers. No wonder there are so many resends. The deadline needs to be longer or the jobs shorter.
----------------------------------------Cheers |
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7574 Status: Offline Project Badges: |
The 24 hour deadline is insane. The jobs are taking longer than 24 hours on some of my computers. No wonder there are so many resends. The deadline needs to be longer or the jobs shorter. Cheers Well, I think if the 24 hour deadline is too short for some of your machines, just don't run this project on those machines. I am only running this project on one machine(Linux) and am only seeing about 1 resend per 20 jobs. It could be different for Windows. Sometimes our hardware is just too limiting to do adequate justice to what we desire to crunch. Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
tmedve
Senior Cruncher USA Joined: Nov 16, 2004 Post Count: 182 Status: Offline Project Badges: |
Out of 90 completed FAH2 that I can still see, there are only 2 where my wingman (first attempt) was "Too Late". All the rest either had a wingman error or had replication = 1. I'm using Windows 10 so I don't see the number of resends that you see.
----------------------------------------My tasks on an i5 processor run either 1, 3 or 5 hours. And, as soon as one finishes, the system immediately sends another one to start. I am only running FAH2 and Beta on my computer. I usually pick up several Betas each day. Only 10 more days till I get my Sapphire for FAH2. :) |
||
|
cjslman
Master Cruncher Mexico Joined: Nov 23, 2004 Post Count: 2082 Status: Offline Project Badges: |
Only 10 more days till I get my Sapphire for FAH2. :) Congratulations on your upcoming milestone. I too am chasing the sapphire, only I still have a couple of months to go yet. So, for the third time (and in different threads), does anybody know why the deadline is 24 hrs? Again, I'm not complaining, I'm just curious (at this rate, by the time I find out, the project will have ended )Thanks, CJSL Crunching like there's no tomorrow... ---------------------------------------- [Edit 1 times, last edit by cjslman at Jan 20, 2018 12:29:28 PM] |
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7574 Status: Offline Project Badges: |
Only 10 more days till I get my Sapphire for FAH2. :) Congratulations on your upcoming milestone. I too am chasing the sapphire, only I still have a couple of months to go yet. So, for the third time (and in different threads, does anybody know why the deadline is 24 hrs? Again, I'm not complaining, I'm just curious (at this rate, by the time I find out, the project will have ended )Thanks, CJSL Crunching like there's no tomorrow... I think the answer to the 24 hour deadline is that the work units have such a large number of parts. For instance this one: FAH2_ 001464_ avx17629-0_ 000008_ 000077_ 028 _ 0-- The "028" means it is the 28th part of this work unit. I seem to recall somewhere that these work units may have more than 100 parts. If that were the case, rather than taking 100 days to complete the entire work unit it could take up to 1000 days ( about 2.75 years) to complete the sequence for that work unit if the deadline were 10 days. I would think the scientists would like their results of the completed work units in a timelier fashion. I believe there is some backend work about putting all the parts of the work unit together to get the final result. This implies that there could be some kind of storage constraint on holding the parts of the work unit until its completion. If I am off base on this explanation, someone with more information should feel free to correct me. Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
OldChap
Veteran Cruncher UK Joined: Jun 5, 2009 Post Count: 978 Status: Offline Project Badges: |
This Showing percentage complete and runtime so far
----------------------------------------83.19 "-|- 01d,20:52:02 (01d,19:45:42) 73.21 "-|- 01d,17:29:21 (01d,16:31:26) 98.55 "-|- 01d,08:29:38 (01d,07:41:46) 80.12 "-|- 01d,02:28:51 (01d,01:44:05) 79.19 "-|- 01d,02:28:51 (01d,01:44:02) 65.14 "-|- 22:28:54 (21:46:27) 26.23 "-|- 14:28:32 (14:13:09) 19.33 "-|- 10:28:32 (10:18:49) 29.76 "-|- 03:28:58 (03:22:10) 6.14 "-|- 03:28:58 (03:23:09) 61.87 "-|- 02:10:15 (02:05:44) 4.04 "-|- 00:29:20 (00:28:19) 7.59 "-|- 00:29:20 (00:29:01) 1.52 "-|- 00:29:15 (00:28:28) I am seeing one or two of these betas that run for longer than a day on the slow machines fail with too late but for the most part they are valid These from yesterday: Valid 51.43 / 52.74 Valid 31.88 / 32.79 Valid 31.15 / 31.59 Valid 32.12 / 32.61 Too Late 53.66 / 55.12 Valid 10.45 / 10.70 Valid 55.17 / 56.70 Valid 52.44 / 53.51 Valid 54.05 / 55.14 Valid 2.08 / 2.10 Valid 10.54 / 10.61 I decided to keep running these in order that the techs have examples of slow to work with when forward planning |
||
|
cjslman
Master Cruncher Mexico Joined: Nov 23, 2004 Post Count: 2082 Status: Offline Project Badges: |
I think the answer to the 24 hour deadline is that the work units have such a large number of parts. For instance this one: Thanks for the answer. That makes sense, although I can understand why the short deadline can't be handled by everybody... like they say, it's the nature of the beast.FAH2_ 001464_ avx17629-0_ 000008_ 000077_ 028 _ 0-- Thanks, CJSL Saving the world, one crunch at a time... |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
The 24hr deadline IS absolutely insane. It needs to be bumped up to 48hrs to accommodate a queue depth of 24hrs. As it is, if you queue up 24hrs worth of work, every task for this project you get ends up being run on high priority, potentially late, stealing CPU cycles from other projects running on the same machine.
"Shorten your queue length" Except there've been sufficient outages and maintenance events on WCG to make a 24hr queue sensible. It allows, for the most part, even a major outage or maintenance to occur without interrupting processing on the client end. |
||
|
ca05065
Senior Cruncher Joined: Dec 4, 2007 Post Count: 325 Status: Offline Project Badges: |
The following is from memory and not searching the forums.
When FAHB was first introduced a work unit consisted of 100,000 steps with a trickle up process every 10,000 steps. The next work unit was generated using the output from the returned work unit. A complete ‘unit of work’ consisted of a chain of 10 work units. Each work unit had a 10 day deadline so the complete chain could take about 100 days. Each work unit needed 10 to 15 hours CPU time to complete on a reasonably fast machine (I7-6700). The project had strange rules which caused the server abort of running work units if the server considered the work unit would not complete before the deadline which was not always popular. The trickle up process also caused problems for PCs which did not have a permanent internet connection as the upload of the completed work unit had problems if all the trickle up results had not been processed. There were also problems when the upload servers were not available due to maintenance at WCG. The current work units avoid the above problems but have the disadvantage of the 1 day deadline and the allowance of one work unit per thread. There are two types of work units which contain either 10,000 or 50,000 steps. Looking at my results I can see a 26th generation work unit of 50,000 i.e. 1,300,000 steps. The highest 10,000 series work unit I can see is only 32 generations (but I seem to remember much higher generations in the past) which is 320,000 steps. The beta for this project is still on-going. There I can see 196 generations of 10,000 steps (1,960,000 in total), 117 generations of 30,000 steps (3,510,000 in total) and 83 generations of 50,000 (4,150,000 in total). I hope I have remembered correctly. |
||
|
NixChix
Veteran Cruncher United States Joined: Apr 29, 2007 Post Count: 1187 Status: Offline Project Badges: |
Well, I think if the 24 hour deadline is too short for some of your machines, just don't run this project on those machines. I am only running this project on one machine(Linux) and am only seeing about 1 resend per 20 jobs. It could be different for Windows. Sometimes our hardware is just too limiting to do adequate justice to what we desire to crunch. That is a reasonable suggestion Sgt.Joe, however I have 2 things that work against doing that:Cheers 1. Limited number of profiles available, I am already using special profiles for Android and HSTB. There are none left to make a special profile to accommodate this problem. Adding additional profiles must be the oldest item on the Cruncher Wish List. 2. Without the contribution of the slower computers I cannot reach the next level, thus eliminating my incentive to continue to participate in FAHB vs. one of the other projects. Cheers [Edit to fix typo] [Edit 1 times, last edit by NixChix at Jan 20, 2018 7:04:56 PM] |
||
|
|