World Community Grid - View Thread - Planned System Maintenance

World Community Grid Forums

Category: Support

Forum: Website Support

Thread: Planned System Maintenance - 6th & 12th July

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 19

[ ]

Author

This topic has been viewed 3478 times and has 18 replies

Steve W
Advanced Cruncher
Joined: Dec 9, 2005
Post Count: 110
Status: Offline
Project Badges:

180 day badge for Human Proteome Folding - Phase 2

14 day badge for Nutritious Rice for the World

14 day badge for Help Fight Childhood Cancer

1 year badge for Help Cure Muscular Dystrophy - Phase 2

14 day badge for The Clean Energy Project - Phase 2

1 year badge for Computing for Clean Water

90 day badge for Drug Search for Leishmaniasis

14 day badge for GO Fight Against Malaria

14 day badge for Computing for Sustainable Water

90 day badge for Uncovering Genome Mysteries

1 year badge for Outsmart Ebola Together

90 day badge for FightAIDS@Home - Phase 2

180 day badge for Microbiome Immunity Project

14 day badge for Africa Rainfall Project

1 year badge for OpenPandemics - COVID-19


Re: Planned System Maintenance - 6th & 12th July

Reminder that there is system maintenance this weekend on 12th - which could last upto 20 hours. shock

Details here.

[Jul 11, 2014 10:24:07 AM]

branjo
Master Cruncher
Slovakia
Joined: Jun 29, 2012
Post Count: 1892
Status: Offline
Project Badges:

1 year badge for Human Proteome Folding - Phase 2

180 day badge for Help Fight Childhood Cancer

180 day badge for Help Cure Muscular Dystrophy - Phase 2

2 year badge for The Clean Energy Project - Phase 2

2 year badge for Drug Search for Leishmaniasis

1 year badge for GO Fight Against Malaria

20 year badge for Mapping Cancer Markers

5 year badge for Uncovering Genome Mysteries

5 year badge for Outsmart Ebola Together

5 year badge for FightAIDS@Home - Phase 2

20 year badge for Smash Childhood Cancer

10 year badge for Microbiome Immunity Project

5 year badge for Africa Rainfall Project

5 year badge for OpenPandemics - COVID-19


Re: Planned System Maintenance - 6th & 12th July

Thanks for reminder biggrin

Cheers

----------------------------------------

Crunching@Home since January 13 2000. Shrubbing@Home since January 5 2006

[Jul 11, 2014 11:44:45 AM]

cjslman
Master Cruncher
Mexico
Joined: Nov 23, 2004
Post Count: 2082
Status: Offline
Project Badges:

90 day badge for Human Proteome Folding - Phase 2

90 day badge for Help Cure Muscular Dystrophy - Phase 2

45 day badge for Discovering Dengue Drugs - Together - Phase 2

1 year badge for The Clean Energy Project - Phase 2

1 year badge for Drug Search for Leishmaniasis

90 day badge for Computing for Sustainable Water

10 year badge for Mapping Cancer Markers

2 year badge for Uncovering Genome Mysteries

2 year badge for FightAIDS@Home - Phase 2

5 year badge for Microbiome Immunity Project

90 day badge for Africa Rainfall Project


Re: Planned System Maintenance - 6th & 12th July

Thanks Steve for the reminder / heads up.

CJSL

Crunching for a better world...

----------------------------------------

I follow the Gimli philosophy: "Keep breathing. That's the key. Breathe."
Join The Cahuamos Team

[Jul 11, 2014 12:02:46 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Planned System Maintenance - 6th & 12th July

Principally running all project with zero share so not to get any buffered work, just enough to occupy the active cores. For the outage occasion upped the share to 0.1 and increased buffer to 1 day. What it demonstrated, without touching buttons, was that the limit is 30 work units per call, not 3 as someone was complaining, and in back-off interval of 2:01 minutes receiving a total of 90. The log censored extract below shows that initially 674k seconds was requested and 152k seconds was sent. Then 522k seconds was asked and 148251 was sent, for another 30 tasks and finally 375k seconds were honored with 342k seconds for another 30 tasks. With this only being mcm, how estimated runtimes managed to swing that much within 4 minutes is the wonders of boinc.

2751017 Default

10215 World Community Grid 7/11/2014 3:32:24 PM Sending scheduler request: To fetch work.
10216 World Community Grid 7/11/2014 3:32:24 PM Requesting new tasks for CPU
10217 World Community Grid 7/11/2014 3:32:24 PM [sched_op] CPU work request: 674267.70 seconds; 0.00 devices
10218 World Community Grid 7/11/2014 3:32:29 PM Scheduler request completed: got 30 new tasks
10219 World Community Grid 7/11/2014 3:32:29 PM [sched_op] Server version 701
10220 World Community Grid 7/11/2014 3:32:29 PM Project requested delay of 121 seconds
10221 World Community Grid 7/11/2014 3:32:29 PM [sched_op] estimated total CPU task duration: 152015 seconds
10285 World Community Grid 7/11/2014 3:34:34 PM [sched_op] Starting scheduler request
10286 World Community Grid 7/11/2014 3:34:34 PM Sending scheduler request: To fetch work.
10287 World Community Grid 7/11/2014 3:34:34 PM Requesting new tasks for CPU
10288 World Community Grid 7/11/2014 3:34:34 PM [sched_op] CPU work request: 523022.55 seconds; 0.00 devices
10289 World Community Grid 7/11/2014 3:34:39 PM Scheduler request completed: got 30 new tasks
10290 World Community Grid 7/11/2014 3:34:39 PM [sched_op] Server version 701
10291 World Community Grid 7/11/2014 3:34:39 PM Project requested delay of 121 seconds
10292 World Community Grid 7/11/2014 3:34:39 PM [sched_op] estimated total CPU task duration: 148251 seconds
10293 World Community Grid 7/11/2014 3:34:39 PM [sched_op] Deferring communication for 00:02:01
10294 World Community Grid 7/11/2014 3:34:39 PM [sched_op] Reason: requested by project
10357 World Community Grid 7/11/2014 3:36:44 PM [sched_op] Starting scheduler request
10358 World Community Grid 7/11/2014 3:36:44 PM Sending scheduler request: To fetch work.
10359 World Community Grid 7/11/2014 3:36:44 PM Requesting new tasks for CPU
10360 World Community Grid 7/11/2014 3:36:44 PM [sched_op] CPU work request: 375404.12 seconds; 0.00 devices
10361 World Community Grid 7/11/2014 3:36:48 PM Scheduler request completed: got 30 new tasks
10362 World Community Grid 7/11/2014 3:36:48 PM [sched_op] Server version 701
10363 World Community Grid 7/11/2014 3:36:48 PM Project requested delay of 121 seconds
10364 World Community Grid 7/11/2014 3:36:48 PM [sched_op] estimated total CPU task duration: 341573 seconds
10365 World Community Grid 7/11/2014 3:36:48 PM [sched_op] Deferring communication for 00:02:01
10366 World Community Grid 7/11/2014 3:36:48 PM [sched_op] Reason: requested by project

After some agent rumbling, version 7.3.18, now have 1:12 days per thread, but given the short batches in between, probably not too much to bridge the outage. The agent has 24 hours to figure it out and either increase or decrease before the 'project is offline for maintenance'.

Will set buffer back to 0.00 as soon as the project goes offline. If then it's not enough till wcg returns, other backup project will start being asked. Of course, if 90 complete before return, no more work will be requested from wcg anyhow, since the number of outstanding uploads will disable work requesting till that succeeds, or number of open uploads get's below computing threads times factor 2.

my 2 bolivars

[Jul 11, 2014 2:04:08 PM]

uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:

10 year badge for Human Proteome Folding

2 year badge for Human Proteome Folding - Phase 2

45 day badge for Help Cure Muscular Dystrophy

2 year badge for Discovering Dengue Drugs - Together

20 year badge for Nutritious Rice for the World

2 year badge for The Clean Energy Project

5 year badge for Help Fight Childhood Cancer

2 year badge for Influenza Antiviral Drug Search

2 year badge for Help Cure Muscular Dystrophy - Phase 2

2 year badge for Discovering Dengue Drugs - Together - Phase 2

10 year badge for The Clean Energy Project - Phase 2

5 year badge for Computing for Clean Water

10 year badge for Drug Search for Leishmaniasis

20 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

50 year badge for Mapping Cancer Markers

50 year badge for Uncovering Genome Mysteries

20 year badge for Outsmart Ebola Together

100 year badge for FightAIDS@Home - Phase 2

50 year badge for Microbiome Immunity Project

10 year badge for Africa Rainfall Project

50 year badge for OpenPandemics - COVID-19


Re: Planned System Maintenance - 6th & 12th July

A little bit of an update. The change will cause various parts of the website to go down. The main part which everyone is concerned about is the downloading and uploading of new work. This will be down the longest. The website is probably going to be down for the time it takes to reboot a server, so probably 15 minutes during this entire maintenance window.

Basically if you don't want your computers to run dry, I would suggest bumping up your caches to 1 day.

I will try to keep everyone up to date on how things are progressing as this is a major outage, but in the long run will make our system run smoother.

Thanks,
-Uplinger

[Jul 11, 2014 3:11:10 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Planned System Maintenance - 6th & 12th July

Thanks for info/advice Keith biggrin

[Jul 11, 2014 3:23:48 PM]

branjo
Master Cruncher
Slovakia
Joined: Jun 29, 2012
Post Count: 1892
Status: Offline
Project Badges:


Re: Planned System Maintenance - 6th & 12th July

During the latest outage I got this type of error (no runtime, just came in and went out):

Result Name: MCM1_ 0005658_ 0023_ 0--

<core_client_version>7.0.65</core_client_version>
<![CDATA[
<message>
WU download error: couldn't get input files:
<file_xfer_error>
<file_name>MCM1_0005658_0023_MCM1_0005658_0023.txt</file_name>
<error_code>-224</error_code>
<error_message>permanent HTTP error</error_message>
</file_xfer_error>

</message>
]]>

----------------------------------------

Crunching@Home since January 13 2000. Shrubbing@Home since January 5 2006

----------------------------------------
[Edit 1 times, last edit by branjo at Jul 11, 2014 7:44:50 PM]

[Jul 11, 2014 7:43:16 PM]

uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:


Re: Planned System Maintenance - 6th & 12th July

Branjo, that sounds like you got unlucky and tried to download the file when the http server was offline. There was about a 15 minute window when the http servers were completely offline. I would doubt it was a file that was downloaded partially because those txt files are relatively small if my memory serves me correctly.

We will be gracefully shutting down the http servers this time for the change, the main website should stay up.

FYI, the change window will start in a little over 1 hour. Fill your buffers.

Thanks,
-Uplinger

[Jul 11, 2014 11:48:40 PM]

branjo
Master Cruncher
Slovakia
Joined: Jun 29, 2012
Post Count: 1892
Status: Offline
Project Badges:


Re: Planned System Maintenance - 6th & 12th July

Thanks uplinger - I had enough WU's in queue so there was not a problem for me raised eyebrow

Cheers

----------------------------------------

Crunching@Home since January 13 2000. Shrubbing@Home since January 5 2006

[Jul 12, 2014 7:15:27 PM]

[ ]