| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 19
|
|
| Author |
|
|
Steve W
Advanced Cruncher Joined: Dec 9, 2005 Post Count: 110 Status: Offline Project Badges:
|
Reminder that there is system maintenance this weekend on 12th - which could last upto 20 hours.
Details here. |
||
|
|
branjo
Master Cruncher Slovakia Joined: Jun 29, 2012 Post Count: 1892 Status: Offline Project Badges:
|
Thanks for reminder
---------------------------------------- Cheers ![]() Crunching@Home since January 13 2000. Shrubbing@Home since January 5 2006 ![]() |
||
|
|
cjslman
Master Cruncher Mexico Joined: Nov 23, 2004 Post Count: 2082 Status: Offline Project Badges:
|
Thanks Steve for the reminder / heads up.
----------------------------------------CJSL Crunching for a better world... |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Principally running all project with zero share so not to get any buffered work, just enough to occupy the active cores. For the outage occasion upped the share to 0.1 and increased buffer to 1 day. What it demonstrated, without touching buttons, was that the limit is 30 work units per call, not 3 as someone was complaining, and in back-off interval of 2:01 minutes receiving a total of 90. The log censored extract below shows that initially 674k seconds was requested and 152k seconds was sent. Then 522k seconds was asked and 148251 was sent, for another 30 tasks and finally 375k seconds were honored with 342k seconds for another 30 tasks. With this only being mcm, how estimated runtimes managed to swing that much within 4 minutes is the wonders of boinc.
2751017 Default 10215 World Community Grid 7/11/2014 3:32:24 PM Sending scheduler request: To fetch work. 10216 World Community Grid 7/11/2014 3:32:24 PM Requesting new tasks for CPU 10217 World Community Grid 7/11/2014 3:32:24 PM [sched_op] CPU work request: 674267.70 seconds; 0.00 devices 10218 World Community Grid 7/11/2014 3:32:29 PM Scheduler request completed: got 30 new tasks 10219 World Community Grid 7/11/2014 3:32:29 PM [sched_op] Server version 701 10220 World Community Grid 7/11/2014 3:32:29 PM Project requested delay of 121 seconds 10221 World Community Grid 7/11/2014 3:32:29 PM [sched_op] estimated total CPU task duration: 152015 seconds 10285 World Community Grid 7/11/2014 3:34:34 PM [sched_op] Starting scheduler request 10286 World Community Grid 7/11/2014 3:34:34 PM Sending scheduler request: To fetch work. 10287 World Community Grid 7/11/2014 3:34:34 PM Requesting new tasks for CPU 10288 World Community Grid 7/11/2014 3:34:34 PM [sched_op] CPU work request: 523022.55 seconds; 0.00 devices 10289 World Community Grid 7/11/2014 3:34:39 PM Scheduler request completed: got 30 new tasks 10290 World Community Grid 7/11/2014 3:34:39 PM [sched_op] Server version 701 10291 World Community Grid 7/11/2014 3:34:39 PM Project requested delay of 121 seconds 10292 World Community Grid 7/11/2014 3:34:39 PM [sched_op] estimated total CPU task duration: 148251 seconds 10293 World Community Grid 7/11/2014 3:34:39 PM [sched_op] Deferring communication for 00:02:01 10294 World Community Grid 7/11/2014 3:34:39 PM [sched_op] Reason: requested by project 10357 World Community Grid 7/11/2014 3:36:44 PM [sched_op] Starting scheduler request 10358 World Community Grid 7/11/2014 3:36:44 PM Sending scheduler request: To fetch work. 10359 World Community Grid 7/11/2014 3:36:44 PM Requesting new tasks for CPU 10360 World Community Grid 7/11/2014 3:36:44 PM [sched_op] CPU work request: 375404.12 seconds; 0.00 devices 10361 World Community Grid 7/11/2014 3:36:48 PM Scheduler request completed: got 30 new tasks 10362 World Community Grid 7/11/2014 3:36:48 PM [sched_op] Server version 701 10363 World Community Grid 7/11/2014 3:36:48 PM Project requested delay of 121 seconds 10364 World Community Grid 7/11/2014 3:36:48 PM [sched_op] estimated total CPU task duration: 341573 seconds 10365 World Community Grid 7/11/2014 3:36:48 PM [sched_op] Deferring communication for 00:02:01 10366 World Community Grid 7/11/2014 3:36:48 PM [sched_op] Reason: requested by project After some agent rumbling, version 7.3.18, now have 1:12 days per thread, but given the short batches in between, probably not too much to bridge the outage. The agent has 24 hours to figure it out and either increase or decrease before the 'project is offline for maintenance'. Will set buffer back to 0.00 as soon as the project goes offline. If then it's not enough till wcg returns, other backup project will start being asked. Of course, if 90 complete before return, no more work will be requested from wcg anyhow, since the number of outstanding uploads will disable work requesting till that succeeds, or number of open uploads get's below computing threads times factor 2. my 2 bolivars |
||
|
|
uplinger
Former World Community Grid Tech Joined: May 23, 2005 Post Count: 3952 Status: Offline Project Badges:
|
A little bit of an update. The change will cause various parts of the website to go down. The main part which everyone is concerned about is the downloading and uploading of new work. This will be down the longest. The website is probably going to be down for the time it takes to reboot a server, so probably 15 minutes during this entire maintenance window.
Basically if you don't want your computers to run dry, I would suggest bumping up your caches to 1 day. I will try to keep everyone up to date on how things are progressing as this is a major outage, but in the long run will make our system run smoother. Thanks, -Uplinger |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Thanks for info/advice Keith
![]() |
||
|
|
branjo
Master Cruncher Slovakia Joined: Jun 29, 2012 Post Count: 1892 Status: Offline Project Badges:
|
During the latest outage I got this type of error (no runtime, just came in and went out):
----------------------------------------Result Name: MCM1_ 0005658_ 0023_ 0-- <core_client_version>7.0.65</core_client_version> <![CDATA[ <message> WU download error: couldn't get input files: <file_xfer_error> <file_name>MCM1_0005658_0023_MCM1_0005658_0023.txt</file_name> <error_code>-224</error_code> <error_message>permanent HTTP error</error_message> </file_xfer_error> </message> ]]> ![]() Crunching@Home since January 13 2000. Shrubbing@Home since January 5 2006 ![]() [Edit 1 times, last edit by branjo at Jul 11, 2014 7:44:50 PM] |
||
|
|
uplinger
Former World Community Grid Tech Joined: May 23, 2005 Post Count: 3952 Status: Offline Project Badges:
|
Branjo, that sounds like you got unlucky and tried to download the file when the http server was offline. There was about a 15 minute window when the http servers were completely offline. I would doubt it was a file that was downloaded partially because those txt files are relatively small if my memory serves me correctly.
We will be gracefully shutting down the http servers this time for the change, the main website should stay up. FYI, the change window will start in a little over 1 hour. Fill your buffers. Thanks, -Uplinger |
||
|
|
branjo
Master Cruncher Slovakia Joined: Jun 29, 2012 Post Count: 1892 Status: Offline Project Badges:
|
Thanks uplinger - I had enough WU's in queue so there was not a problem for me
---------------------------------------- Cheers ![]() ![]() Crunching@Home since January 13 2000. Shrubbing@Home since January 5 2006 ![]() |
||
|
|
|