| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 17
|
|
| Author |
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi,
Since yesterday I keep getting errors from the server saying it's getting no response from the server at all in terms of headers or anything, this all happened half way during some result uploading yesterday and I'm wondering whether anyone else is getting no response from the server like this with no headers ir anything like this? |
||
|
|
toss
Senior Cruncher New Zealand Joined: Jan 3, 2007 Post Count: 220 Status: Offline Project Badges:
|
|
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Yesterday, could have been just before midnight depending on where you are on the planet. See the various other threads, What Happened, Server Status and the Outage message by knreed... the whole WCG net operation was off-line for over 6 hours. If you have files to upload in the Transfers window, select one and hit the Retry Now button.
----------------------------------------Off-line the nightly statistics updated perfectly, with lots up job reports that did not make it in, so they will be included today.
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
|
jmcgaw
Advanced Cruncher US Joined: Feb 2, 2007 Post Count: 54 Status: Offline Project Badges:
|
Yeah, I got bitten by the same thing.
Even worse, my most productive machine "BEAST" is in its little room in the basement and only gets looked at occasionally using remote desktop. When I looked it was saying that all the tasks had been done, none were running, and that communications were deferred for 6+ hours. The latter seems to be a truly dumb thing to write into a program. Maybe defer for 10 minutes or for some short random period but six hours is entirely too much. BEAST can chew through 8-10 work units in that time. My two other fast machines were at least not stuck with 6 hours -- just 3 which is at least better.I did jack up my queue of spare work units to allow a bit of work to get done in the future if things go wonky again. 1 day now. Maybe it should be longer than that though. Maybe some mechanism needs to be added to have the clients notify us by email if things like this go wrong. Should be doable -- other programs manage it. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I have 6 machines that are down until I can get to them. Its a shame to have 10 cores down for no discernible or preventable reason
|
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
What client version? The deferrals are incremental so if there was a curve-ball, 6 hours is possible, where when the server contact proofed to be impossible.
----------------------------------------The price to the grid was dear. Tuesday morning was 115 years instead of ~est.155 years with already an est. shortfall for Monday of 25 years (the fail started on Monday). The afternoon session came back to 142 another 12-13 short, so in all 78+ years gone, irredeemably. So with better logic in the upcoming client and bits like zero weight backup project function to prevent a client going truly idle, it's up to the techs and the developers to see how what else needs tweaking. Certainly if projects have hundreds of thousands of devices active and coming back up, you'd not want them all connect within the hour of coming online again... that would cause another crash for sure. edit: many of the deferred contacts reported in on Wednesday morning turning in an extra 22 CPU years over last weeks same day morning session, 178 now versus 156 last. Maybe there's another little kick this afternoon from those that had been running on cache.
WCG
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 1 times, last edit by Sekerob at Feb 3, 2010 12:47:44 PM] |
||
|
|
Decrypt74
Cruncher France Joined: May 20, 2009 Post Count: 36 Status: Offline Project Badges:
|
Most of mine was deferred by more than 17h
---------------------------------------- Misconfiguration on my side ?? ![]() ![]() |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Since my client is attached to the internet 24/7, I've set the connect interval to 0.00 days. When I looked in about 8 hours into the crash the deferral counter was at 31 minutes, but that's with the 6.10.29 alpha client. As WCG is preparing and testing 6.10, I've asked to look into the logic to see if that needs further tweaking. It may be different, improved over the present 6.2 client that is still the house standard download. Can't find ready digestible documentation that explains the deferral states, how far they increment and so on. In past seen it was going over 24 hours if a project continued to be down / unreachable.
----------------------------------------
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
|
Decrypt74
Cruncher France Joined: May 20, 2009 Post Count: 36 Status: Offline Project Badges:
|
my clients are also attached 24/7, and i've set connect interval to 0 and additional work buffer to 0.05.
----------------------------------------my windows clients are from 6.6.36 to 6.10.18 and 6.4.5 for linux ![]() ![]() |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
If you wish to make sure to be able to bridge downtimes set it back to the default 0.3 days additional buffer (local prefs) or cache on the web device profile. Think many run with 0.5 to 1.00 days and more looking at the wingmen validations. 1.00 is also the longest period that the client will try to send the Ready to Report jobs, where the result files are always uploaded ASAP if there is a ready connection.
----------------------------------------
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
|
|