| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 352
|
|
| Author |
|
|
RCC_Survivor
Veteran Cruncher USA Joined: Apr 28, 2007 Post Count: 1337 Status: Offline Project Badges:
|
So what do we crunchers do for the meantime: limit our sync-to-WCG to once every 24hrs? That would seem to be a direct solution to regulate the traffic and thereby help ease congestion. Or, do we crunchers have to play it by ear? andzgrid, This is a good suggestion and would be glad to set my network access in Preferences to a time period when server load was at a minimum. SekeRob or knreed, Do you have any server load info that would help determine the hours when server load is heaviest and lightest? I do not mind running a 2-day queue with network access restricted to a few hours a day. I have lost a lot of WUs since last month and will do whatever it takes to reduce the losses. Am I wrong or did this problem start after they did some recent software upgrades?
Be kinder than necessary, for everyone you meet is fighting some battle.
Please join the team The survivors ![]() Bilateral Renal, Melanoma, and Squamous Cell cancers |
||
|
|
BSD
Senior Cruncher Joined: Apr 27, 2011 Post Count: 224 Status: Offline |
My devices are playing in someone else's yard for a while.
Hope the techs get it all sorted out. They are a busy bunch. Cheers |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
SekeRob or knreed, Do you have any server load info that would help determine the hours when server load is heaviest and lightest? I do not mind running a 2-day queue with network access restricted to a few hours a day. I have lost a lot of WUs since last month and will do whatever it takes to reduce the losses. Am I wrong or did this problem start after they did some recent software upgrades? Not likely to happen, but in the US of A / GB, where I estimate most Update / Retry Now hammerers to be, just follow the position of the sun and the moon and you have a pretty good idea when the bulk of button operators are not watching for you to sneak in those uploads. Yesterday had 305, day before 455, days before that 1, 10, 14, so I think things in autonomous mode are doing pretty good with the [fuzzy logic] scheme knreed has devised to respond to momentary overloads. Sure enough there are the random back-offs when things are too busy, but so far they all seem to recover after the deferral countdowns have run down, one / two / three times. Keep a fair cache 1.0 days and never a moment dry. In all this, lost zero tasks. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
... and would be glad to set my network access in Preferences to a time period when server load was at a minimum. Going forward, we need to first make the above manual step if only as preparatory to the next step: WCG-server to WCG-clientMachine M2M (machine-to-machine) communications. The data in a serverStatus (webpage) would facilitate the transition.; |
||
|
|
RCC_Survivor
Veteran Cruncher USA Joined: Apr 28, 2007 Post Count: 1337 Status: Offline Project Badges:
|
SekeRob or knreed, Do you have any server load info that would help determine the hours when server load is heaviest and lightest? I do not mind running a 2-day queue with network access restricted to a few hours a day. I have lost a lot of WUs since last month and will do whatever it takes to reduce the losses. Am I wrong or did this problem start after they did some recent software upgrades? Not likely to happen, but in the US of A / GB, where I estimate most Update / Retry Now hammerers to be, just follow the position of the sun and the moon and you have a pretty good idea when the bulk of button operators are not watching for you to sneak in those uploads. Yesterday had 305, day before 455, days before that 1, 10, 14, so I think things in autonomous mode are doing pretty good with the [fuzzy logic] scheme knreed has devised to respond to momentary overloads. Sure enough there are the random back-offs when things are too busy, but so far they all seem to recover after the deferral countdowns have run down, one / two / three times. Keep a fair cache 1.0 days and never a moment dry. In all this, lost zero tasks. (Edit - removed reference because of SekeRob's comment "not a good idea to mention".) There are problems after sundown. Based on a lack of info I will use SWAG and experiment until I get it right. Shouldn't have to do this. I am really surprised that the servers do not have a load balance/control system and the techs have to improvise. Again, I think this problem started after the server/filesystem upgrades/updates in June. It is difficult to troubleshoot intermittent problems when multiple changes were made in a short period. Been there, done that. When we had a problem it was discussed on the evening news and the next day's newspaper so we met/exceeded our 99.99% up time target because our paycheck was on the line. If there was an outage we were required to write letters to upper management explaining why it happened and what we would do to prevent it in the future. So I understand the difficulty and pressure involved in resolving service problems.
Be kinder than necessary, for everyone you meet is fighting some battle.
----------------------------------------Please join the team The survivors ![]() Bilateral Renal, Melanoma, and Squamous Cell cancers [Edit 1 times, last edit by RCC_Survivor at Jul 29, 2012 7:14:52 PM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Not a good idea to mention that, and has become largely superfluous, particular to those that do scheduled networking and crunching and a few more reasons [client 7.0.xx];), but *no*, this is separate from the upload saturation issue... the 1st part of the upload/reporting cycle. The RtR is part 2.
It needs no repeating that something in the upgrading path at the server side kicked this, and IBM now working intensely with the likes as Red Hat and hosting support to get to the root of the issue. |
||
|
|
RCC_Survivor
Veteran Cruncher USA Joined: Apr 28, 2007 Post Count: 1337 Status: Offline Project Badges:
|
Changed queue to 2.0 days and limit network access to 00:00-06:00 EDT (UTC-4) and experienced problems at 02:57 and 03:43 on separate PCs.
----------------------------------------There may not be a time period that is free from the problem. I feel I am having a "Popeil moment" and will "set it and forget it" as there are bigger fish to fry.
Be kinder than necessary, for everyone you meet is fighting some battle.
Please join the team The survivors ![]() Bilateral Renal, Melanoma, and Squamous Cell cancers |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hello RCC_Survivor,
https://secure.worldcommunitygrid.org/forums/...d,33316_offset,180#385932 and https://secure.worldcommunitygrid.org/forums/wcg/viewthread_thread,33481 imply that there is no schedule. They isolate the file system on an as-required basis. Lawrence |
||
|
|
Bearcat
Master Cruncher USA Joined: Jan 6, 2007 Post Count: 2803 Status: Offline Project Badges:
|
Think it's best to let boinc do it's thing and let the servers tell our computers when to check back. Seems to work its self out sooner or later.
----------------------------------------
Crunching for humanity since 2007!
![]() |
||
|
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges:
|
The occurrence of this issue does not appear to correlate with load. As a result, we have not been able to predict when it will occur. We have been doing a lot of data collection this weekend that will allow for further analysis to hopefully find some clues to what is going on.
We have also figure out how to quickly detect that the issue is occurring in some of our backend processes such as our applications that we use to load new work into BOINC for distribution. We are using this to cause processes that are not volunteer facing to 'back-off' so that the system recovers quickly. We hope that this will significantly reduce the times when you are not able to upload/download work. We put this in place about 2 hours ago and we are watching to see what happens. |
||
|
|
|