| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 182
|
|
| Author |
|
|
armstrdj
Former World Community Grid Tech Joined: Oct 21, 2004 Post Count: 695 Status: Offline Project Badges:
|
One of our first steps in our cut-over plan is to "Extend report deadline for all in progress results by 2 days". This will address any issues with results that might become overdue while clients are unable to access the website. How are you planning on handling the "out-of-order" trickles from FAH2? For all projects except FAH2 the validators and assimilators will be brought back up as soon as the migration is complete. For FAH2 we will delay and stagger some of the back end processes to accommodate the trickle messages. Thanks, armstrdj |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
One of our first steps in our cut-over plan is to "Extend report deadline for all in progress results by 2 days". This will address any issues with results that might become overdue while clients are unable to access the website. How are you planning on handling the "out-of-order" trickles from FAH2? For all projects except FAH2 the validators and assimilators will be brought back up as soon as the migration is complete. For FAH2 we will delay and stagger some of the back end processes to accommodate the trickle messages. Thanks, armstrdj Just to pull this back into active memory. If trickle 10 is not uploaded/reported AFTER trickle 9, trickle 9 AFTER trickle 8 and your trickle processor/validator gets the trickles served *out-of-order*, the result is deemed invalid. This happens in day-to-day operations, a longstanding issue with FAHB, so, why would I trust in the avalanche of result/trickle reporting for the order to be processed correctly, when we've been forcibly crunching any FAH2 off-line? That is the concern voiced by others, again. Personally, I'm not going to speculate in this going right, so I'm draining this project from my machines, well before JIT. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
One of our first steps in our cut-over plan is to "Extend report deadline for all in progress results by 2 days". This will address any issues with results that might become overdue while clients are unable to access the website. How are you planning on handling the "out-of-order" trickles from FAH2? For all projects except FAH2 the validators and assimilators will be brought back up as soon as the migration is complete. For FAH2 we will delay and stagger some of the back end processes to accommodate the trickle messages. Thanks, armstrdj Just to pull this back into active memory. If trickle 10 is not uploaded/reported AFTER trickle 9, trickle 9 AFTER trickle 8 and your trickle processor/validator gets the trickles served *out-of-order*, the result is deemed invalid. This happens in day-to-day operations, a longstanding issue with FAHB, so, why would I trust in the avalanche of result/trickle reporting for the order to be processed correctly, when we've been forcibly crunching any FAH2 off-line? That is the concern voiced by others, again. Personally, I'm not going to speculate in this going right, so I'm draining this project from my machines, well before JIT. Additionally, members have no control over the sequence that files get uploaded. This is especially true if the network becomes active while a member is sleeping. Since we are lacking a proper explanation as how the order would be maintained by WCG once the network becomes active, I have to agree with Sekerob and drain the project manually from the member's side. Also, based on histrory, I'm not totally convinced that the FAH2 project would resume normally after draining. Crystal ball shows many days of issues for FAH2 once the grid is resumed. |
||
|
|
Richard Mitnick
Veteran Cruncher USA Joined: Feb 28, 2007 Post Count: 583 Status: Offline Project Badges:
|
It would be good if we could get an idea about what this move will do and why it is good. WCG already has an IBM center for all of its work.
----------------------------------------Could you tell us please, what will this actually do for WCG? |
||
|
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2346 Status: Offline Project Badges:
|
As this message stated: "Among other things, this move will allow us to identify, diagnose and address major technical issues more quickly."
|
||
|
|
Arquebus
Advanced Cruncher Joined: Jul 9, 2008 Post Count: 70 Status: Offline Project Badges:
|
Will sufficient WU will be sent to not overflow the available work? vep, can you clarify what you mean here? If you want to make sure you have enough work in your queue to cover the outage you may need to modify your settings to increase the number of tasks you download. To accomplish this set the "Cache n extra days of work" to 1 or 2 to be safe. |
||
|
|
dango
Senior Cruncher Joined: Jul 27, 2009 Post Count: 307 Status: Offline Project Badges:
|
Will sufficient WU will be sent to not overflow the available work? vep, can you clarify what you mean here? If you want to make sure you have enough work in your queue to cover the outage you may need to modify your settings to increase the number of tasks you download. To accomplish this set the "Cache n extra days of work" to 1 or 2 to be safe. it was spoken, that limit will be extended to 70/core |
||
|
|
keithhenry
Ace Cruncher Senile old farts of the world ....uh.....uh..... nevermind Joined: Nov 18, 2004 Post Count: 18667 Status: Offline Project Badges:
|
Will sufficient WU will be sent to not overflow the available work? vep, can you clarify what you mean here? If you want to make sure you have enough work in your queue to cover the outage you may need to modify your settings to increase the number of tasks you download. To accomplish this set the "Cache n extra days of work" to 1 or 2 to be safe. it was spoken, that limit will be extended to 70/core Doubling the limit to 70 will still leave the vast majority of my machines dead in the water soon after they start the shutdown. It needs to be done away with a day or so ahead of the shutdown until it is over. Right now, the 35 wu per core limit gets me 15-26 hours of work with SCC, HSTB and FAAH selected and that's better than normal. That 35 limit has been as little as 2-3 hours of work on these same machines (Xeon chips) when FAAH or SCC had the real short WUs. A 140 limit might work if they have absolutely no issues with the move or right after it but who can guarantee that? With a PLANNED two day outage, I will want to have three days of work in each machine's queue at the start. With the 35 limit kicking back in when they come back up, my machines would simply not get any new work until they work back down under the limit in the hours after they come back up. The large majority of Linux machines will go idle during this outage, even with a 70 per core limit. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Unfortunately members will start hitting the 1000 wu limit hardcoded in BOINC. It happens to me right now on my 32 thread machines even with the 35 per core. Increasing it to 70 per core means 16 thread machines will hit the 1000 WU limit before getting 70 per core. Anything over 70 will only help 8 threads and fewer. Relative to SCC and FAH1, most members will hit the 1000 WU limit long before they get 3 days worth of work... Smart thing might be to mix MCM in with the shorter units to get a 3 day queue.
----------------------------------------[Edit 1 times, last edit by Doneske at Apr 27, 2017 12:14:48 AM] |
||
|
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7849 Status: Offline Project Badges:
|
Just want to throw my two cent worth in. If, under the best of circumstances, some of my machines go dry, I will use this opportunity to shut them down and do the spring cleaning on them. I do need to blow them out at least once a year anyway. I will max out my queues the day before and wait to see what happens.
----------------------------------------Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
|
|