Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Completed Research Forum: FightAIDS@Home Phase 2 Thread: 24 Hour Deadline too Short |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 112
|
Author |
|
BladeD
Ace Cruncher USA Joined: Nov 17, 2004 Post Count: 28976 Status: Offline Project Badges: |
Was already explained a few days ago and repeated ad nauseam. The tasks have a dependency and a sequence of hundreds. One result forms the base for generating the next step set. With 24 hours deadline and a sequence of 300, they know they'll have a complete simulation series in at most 300 days for that target. If they'd allow the common 7-10 days deadline, they'd not have a simulation complete until 2025. Enough said. +1 |
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7574 Status: Offline Project Badges: |
Sgt. Joe wrote: "And, it runs by design with a short queue ( loosened a slight bit lately per Lavaflow's request)" Not yet what I can tell, but devised a way around. Set cc_config.xml with <ncpus>9</ncpus> to make WCG think there's a nine core machine asking for work, and set the app_config with <project_max_concurrent>8</project_max_concurrent> to maximum of 8 jobs concurrent for WCG, which works long as only computing for WCG. Now the pausing is a few seconds between 1 fahb finishing and the next one starting. I thought uplinger was going to tweak that setting. Maybe he has just not got around to it yet.Glad you found a workaround. Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
Sabrina Tarson
Advanced Cruncher United States Joined: Jun 27, 2012 Post Count: 149 Status: Offline Project Badges: |
I came up with my own little solution to this. I understand why the scientists behind the project set the work up in the way it is, but I still think it's kind of pushy to have one project that when a computer grabs a workunit for it, makes all the other workunits get pushed to the site until it gets worked on. I personally don't deem any project being crunched here to be more important than any other.
----------------------------------------So what I did, was I just set one of my slower computers to be a dedicated cruncher, while the other computers in my fleet will crunch every other project. That way, I still contribute to this project, while my other computer's workunits aren't being trampled over. |
||
|
p51d
Cruncher Joined: Sep 19, 2006 Post Count: 15 Status: Offline Project Badges: |
Is the 24 hour deadline enforced?
Here's a server aborted FAH2 WU that I just had happen to me. WU received at 22:56 after the previous cruncher hadn't returned the result in the allotted 24 hours. 5 hours later at 4:04, the previous cruncher finally sends in a result (29 hours after they received the WU), and is given full credit. My WU is server aborted. Not a big deal to me, I have plenty of WUs, but if there is a mandatory 24 hour turnaround, why not just force abort late crunchers? |
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7574 Status: Offline Project Badges: |
Is the 24 hour deadline enforced? It is a soft enforcement. If the unit is not returned within 24 hours the system will then issue another unit. The system has no way of knowing if the the first unit will be completed shortly or never. If the first unit is completed before the second unit, then the second unit will be aborted by the system, unless it has already been started (I think.) If neither the first unit nor the second unit is completed within 24 hours of the second unit being issued, then a third unit will be issued, and so on. The point is that once a unit has been completed, then any other issued units will be aborted by the system unless they have been started. I have seen these several times for various projects and I believe this is the way the system assures units will be completed in a timely manner. Other projects use different deadlines. I hope this helps. Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
Seoulpowergrid
Veteran Cruncher Joined: Apr 12, 2013 Post Count: 815 Status: Offline Project Badges: |
So what I did, was I just set one of my slower computers to be a dedicated cruncher, while the other computers in my fleet will crunch every other project. That way, I still contribute to this project, while my other computer's workunits aren't being trampled over. There was an update to the Device Profile page a few weeks ago. Now you can select max number of project WUs for each profile. Case in point, I've now added all my machines to FAHB and gave them a max of 1 WU per machine. Now FAHB doesn't overcrowd the machines. Yes, it still jumps to first, but only 1 WU is jumping, so all other threads are crunching away on other projects quite nicely. :) |
||
|
Sabrina Tarson
Advanced Cruncher United States Joined: Jun 27, 2012 Post Count: 149 Status: Offline Project Badges: |
So what I did, was I just set one of my slower computers to be a dedicated cruncher, while the other computers in my fleet will crunch every other project. That way, I still contribute to this project, while my other computer's workunits aren't being trampled over. There was an update to the Device Profile page a few weeks ago. Now you can select max number of project WUs for each profile. Case in point, I've now added all my machines to FAHB and gave them a max of 1 WU per machine. Now FAHB doesn't overcrowd the machines. Yes, it still jumps to first, but only 1 WU is jumping, so all other threads are crunching away on other projects quite nicely. :)I ended up doing this instead because after your post I realized that FAAH sometimes isn't always reporting that it has work so the one machine would sometimes be doing nothing. Thanks for the recommendation. |
||
|
obecalp
Cruncher El Salvador Joined: Oct 28, 2008 Post Count: 3 Status: Offline Project Badges: |
Well... I keep getting some errors and "too late"s in this project, so I guess I will just wait to get the bronze medal and drop this project so my computers are not doing worth for nothing work...
|
||
|
yangbomb
Cruncher Joined: Aug 6, 2015 Post Count: 16 Status: Offline Project Badges: |
I just inspected how the work units work:
----------------------------------------Every work unit will send out a replication if the previous user didn't return the results in time (which is 24 hours). And before the task was purged from the database anyone returned the result will still have credit. Maybe the 24 hour deadline isn't that scary after all. http://i.imgur.com/wppg2Qd.png Edit: The link is proof for workunits still getting credit even after 12 hours I returned. [Edit 1 times, last edit by yangbomb at Feb 6, 2019 12:12:21 PM] |
||
|
Sir Antony Magnus
Cruncher Joined: Feb 13, 2019 Post Count: 2 Status: Offline |
I am in agreement that these work units are way too short. Causes too many issues as many others have pointed out with regard to users who have multiple project demands. The whole concept of Distributed Computing in my eyes was always use a computers idle time for science, forget speed/time return. They need to remember this is done voluntarily and we do incur costs as end users, also with these short deadlines our valuable time due to micromanagement!
Having said that I will not be contributing. |
||
|
|