| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 264
|
|
| Author |
|
|
BobCat13
Senior Cruncher Joined: Oct 29, 2005 Post Count: 295 Status: Offline Project Badges:
|
They are starting to lengthen again. Went from 10 minutes to 30 minutes and current ones are approximately 95 minutes each. For reference, prior to the 10 minute ones, the machine was running 245-250 minutes a task.
|
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I haven't seen any of those ultra short jobs in MCM1 yet, but I maintain about a 24 hour queue, so maybe tomorrow. (As an aside, I have started to see ultra short FAHV units in Fight Aids.) Coincidence ???? Or maybe a conscious decision by the techs/scientists to to make the WU's easier to digest ???? Cheers There have been a few quick running batches in the past but they weren't 10 minute quick. Now that the WU's times have started to go back up, it very well could be just some quick running datasets. Maybe the scientists have made them shorter but that would also change their 350 million results prediction. If they did make a change, we will be the last to know (as usual). A 24-hour queue can be dangerous, if those 10 minute WU's were sustained and the time estimation on your machines matched as such, if the time then went back to the several hour or more time frame, you could have more WU's than you could report in the deadline period. If you received resends then those would definitely have to be sent back out. |
||
|
|
cjslman
Master Cruncher Mexico Joined: Nov 23, 2004 Post Count: 2082 Status: Offline Project Badges:
|
A 24-hour queue can be dangerous No, I don't think a 24 hours queue is dangerous. I currently have had a 1.5 day (36 hours) queue for some time (aprox. 1 year) and nothing has happened. Now, could the scenario you described really happen? Yes. Is it likely that it will happen? No. And if it did happen, the only thing that would happen is that the excess WUs would timeout and be sent out again to be crunched. So no danger. I'm sure there are crunchers that have queues bigger than those described above. On a couple of times, for DDDT2 I would have a 10 day queue, so I could capture all the WUs possible (and yes, I would lose some WUs at the very end which just got sent out again). Now this is an excellent opportunity to discuss what is the ideal queue to carry and why. Comments? CJSL Gotta keep crunching, there's a world to save... |
||
|
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7846 Status: Offline Project Badges:
|
A 24-hour queue can be dangerous, if those 10 minute WU's were sustained and the time estimation on your machines matched as such, if the time then went back to the several hour or more time frame, you could have more WU's than you could report in the deadline period I agree with Cislman on this. I have maintained a 24 hour queue on most of my machines for more than a couple of years. I have yet to encounter the situation which you describe, although what you describe is both possible and plausible, but highly unlikely, in my opinion. I have experienced very rare occurrences (1 or 2 I think), when I have run out of work. Normally if WCG is going to have an outage of any sort, there is adequate notice of such so queues can be adjusted accordingly. Unexpected outages have been extremely rare. Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
No, I don't think a 24 hours queue is dangerous. I currently have had a 1.5 day (36 hours) queue for some time (aprox. 1 year) and nothing has happened. Now, could the scenario you described really happen? Yes. Is it likely that it will happen? No. No based upon what evidence? Also with too long of a queue and with a minimum quorum of two, if the result is returned and the wingman takes a lot longer, that WU sits there and takes storage up. Not long ago MCM had an issue where the WU results were bigger than normal but still in what the project said was possible. This caused storage issues and caused the project to be temporarily disabled. Add in large WU results and big queue depths and some WU results might sit for over a week before the wingman returns their result. I have also seen where large queues actually slow the processing down. WU's can be aborted on the server side as well and when your computer communicates with the project servers it validates the WU's in your queue; more WU's more processing required. I don't run a long queue in WCG but did on another BOINC project and noticed when I would increase the queue depth processing went down, reduced the queue and processing went up. The WU times were always consistent but crept up when the queue depth was increased. Under MCM1 there have been short running WU as well, usually it is just 10,000 WU's though and not multiple sets in a row. By running a shorter queue, I ran quite a few of the short units whereas the people with longer queues only got a few. Your machine viewed them as standard runtime WU's which is how mine viewed them but kept downloading more of them as the current running ones were approaching 100%. On several machines that had 12 or 16 concurrent WU's running on them they were almost always downloading additional WU's and reporting completed ones and this went on for several hours before the WU times started to increase again. On a typical day I complete around 500 WU's spread over mainly MCM1 and CEP2 with a few FAAH results. On the 6th I was at over 800, so I did 50% and so far on the 7th, I'm at over 2,700. So in my case, a shorter queue allowed me to run through more WU's. |
||
|
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7846 Status: Offline Project Badges:
|
No based upon what evidence? Based on the evidence that I have yet to experience, over several years, the situation you detail. I agree BOINC's feedback scheduling mechanism could have a runaway effect, but it does adjust pretty quickly. The wider the disparity in projected WU completion times the more likely for your scenario to happen, especially if the change from short to long units is quite abrupt. I know the techs try to appropriately size the WU's so unless they run into some unexpected situations, I trust they will intervene in a timely fashion to prevent, or at least mitigate, a BOINC runaway situation. Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
The large WU's caught them off guard even though the sizes being returned did fall into the range the project said was possible. Just because something hasn't happened doesn't mean it won't.
|
||
|
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7846 Status: Offline Project Badges:
|
Just because something hasn't happened doesn't mean it won't. True. But what is the probability it will ? And how serious are the consequences if it does ? Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Large WU results and long queues played a role role in the recent shutdown. Add in changing run-times and you have the trifecta.
|
||
|
|
Speedy51
Veteran Cruncher New Zealand Joined: Nov 4, 2005 Post Count: 1326 Status: Offline Project Badges:
|
Large WU results and long queues played a role role in the recent shutdown. Add in changing run-times and you have the trifecta. I thought the reason for the slow down was because of extra big file upload sizes and the reason why the deadline was shortened to a week was so results could be returned faster to the scientists to free up server space. Work can only be returned to the scientists when a whole batch is complete. I believe the upload file sizes have now been sorted out ![]() |
||
|
|
|