World Community Grid - View Thread - Server Errors. [ RESOLVED ]

World Community Grid Forums

Category: Support

Forum: Website Support

Thread: Server Errors. [ RESOLVED ]

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 352

[ ]

Author

This topic has been viewed 2198887 times and has 351 replies

RCC_Survivor
Veteran Cruncher
USA
Joined: Apr 28, 2007
Post Count: 1337
Status: Offline
Project Badges:

2 year badge for Human Proteome Folding - Phase 2

45 day badge for Help Cure Muscular Dystrophy

90 day badge for Discovering Dengue Drugs - Together

90 day badge for Nutritious Rice for the World

14 day badge for The Clean Energy Project

5 year badge for Help Fight Childhood Cancer

14 day badge for Influenza Antiviral Drug Search

2 year badge for Help Cure Muscular Dystrophy - Phase 2

14 day badge for Discovering Dengue Drugs - Together - Phase 2

2 year badge for The Clean Energy Project - Phase 2

2 year badge for Computing for Clean Water

2 year badge for Drug Search for Leishmaniasis

2 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

100 year badge for Mapping Cancer Markers

5 year badge for Uncovering Genome Mysteries

10 year badge for Outsmart Ebola Together

10 year badge for FightAIDS@Home - Phase 2

10 year badge for Smash Childhood Cancer

10 year badge for Microbiome Immunity Project

2 year badge for Africa Rainfall Project

20 year badge for OpenPandemics - COVID-19


Re: Server Errors.

Whatever the attribution, extended Italian holidays, economics, project choice/continuity, server problems, sure to observe is the deepest summer decline seen in the WCG history:

Lets work to ramp this thing back up to new records.

You talking to me?
Everything I have is running 24/7/365.
It has been that way for years.
There is nothing I can do to improve the stats.

----------------------------------------

Be kinder than necessary, for everyone you meet is fighting some battle.

Please join the team The survivors hugs

Bilateral Renal, Melanoma, and Squamous Cell cancers

[Aug 31, 2012 2:18:57 PM]

astrolabe.
Senior Cruncher
Joined: May 9, 2011
Post Count: 496
Status: Offline


Re: Server Errors.

Lets work to ramp this thing back up to new records.

You talking to me?
Everything I have is running 24/7/365.
It has been that way for years.
There is nothing I can do to improve the stats.

Sometimes stuff gets said on the Forums that just doesn't make sense. Let me see? My rank is 439 out of 601,121 registered, or 99.927%. Don't know who Sekerob is talking to, but it obviously isn't me, as I appear to be carrying my portion of the load. I'd crunch a few more today but looks like the system outage is going to bite me. If ramping is required, then maybe someone better ramp up the system so we can get back to doing what we have been doing all along.

[Aug 31, 2012 2:42:47 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Server Errors.

been crunchin 24-7 365 here also.. but have had to reduce due to abnormal high summer temps. Should be comming to an end hopefully next week..

[Aug 31, 2012 2:53:03 PM]

robertmiles
Senior Cruncher
US
Joined: Apr 16, 2008
Post Count: 445
Status: Offline
Project Badges:

180 day badge for Human Proteome Folding - Phase 2

45 day badge for Discovering Dengue Drugs - Together

45 day badge for The Clean Energy Project

180 day badge for Help Fight Childhood Cancer

45 day badge for Influenza Antiviral Drug Search

180 day badge for Help Cure Muscular Dystrophy - Phase 2

1 year badge for The Clean Energy Project - Phase 2

180 day badge for Computing for Clean Water

180 day badge for Drug Search for Leishmaniasis

1 year badge for GO Fight Against Malaria

14 day badge for Computing for Sustainable Water

20 year badge for Mapping Cancer Markers

1 year badge for Uncovering Genome Mysteries

5 year badge for Outsmart Ebola Together

2 year badge for FightAIDS@Home - Phase 2

2 year badge for Microbiome Immunity Project

1 year badge for Africa Rainfall Project

2 year badge for OpenPandemics - COVID-19


Re: Server Errors.

Is there currently a frequent problem with uploading the output files of workunits? My two faster computers both appear to be waiting to finish uploading the output files from a previous workunit before they request any more WCG workunits, and are waiting around 2 hours for the next retry at uploading. They're connected to several other BOINC projects, though, and getting an adequate supply of workunits from those.

[Aug 31, 2012 5:02:17 PM]

Dataman
Ace Cruncher
Joined: Nov 16, 2004
Post Count: 4865
Status: Offline
Project Badges:

5 year badge for Human Proteome Folding - Phase 2

2 year badge for Discovering Dengue Drugs - Together

2 year badge for Nutritious Rice for the World

180 day badge for The Clean Energy Project

1 year badge for Influenza Antiviral Drug Search

5 year badge for Help Cure Muscular Dystrophy - Phase 2

2 year badge for Discovering Dengue Drugs - Together - Phase 2

5 year badge for Computing for Clean Water

5 year badge for GO Fight Against Malaria

2 year badge for Uncovering Genome Mysteries

2 year badge for Outsmart Ebola Together

45 day badge for Africa Rainfall Project


Re: Server Errors.

http://www.worldcommunitygrid.org/forums/wcg/viewthread_thread,33701

----------------------------------------

[Aug 31, 2012 5:11:34 PM]

KWSN - A Shrubbery
Master Cruncher
Joined: Jan 8, 2006
Post Count: 1585
Status: Offline


Re: Server Errors.

And this is why I never let my cache drop below 1.8 days. Just too many occasions where something goes horribly wrong either on my end or theirs. That little extra keeps my machines humming along.

----------------------------------------

Distributed computing volunteer since September 27, 2000

[Aug 31, 2012 7:37:59 PM]

branjo
Master Cruncher
Slovakia
Joined: Jun 29, 2012
Post Count: 1892
Status: Offline
Project Badges:

1 year badge for Human Proteome Folding - Phase 2

1 year badge for Computing for Clean Water

5 year badge for FightAIDS@Home - Phase 2

20 year badge for Smash Childhood Cancer

5 year badge for Africa Rainfall Project

5 year badge for OpenPandemics - COVID-19


Re: Server Errors.

Will follow your recommendation. I have 1 day cache, but it seems to be not enough. 1.5 - 2 days is reasonable - it is enough to bypass such outages and in case of computer crash it is not that big amount of lost work for sub-project(s).

Cheers and NI!

----------------------------------------

Crunching@Home since January 13 2000. Shrubbing@Home since January 5 2006

----------------------------------------
[Edit 2 times, last edit by branjo at Aug 31, 2012 8:05:22 PM]

[Aug 31, 2012 8:02:01 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Server Errors.

Have marked as resolved, happy days. biggrin

Thanks guys for your work on this. love struck

[Sep 5, 2012 6:09:49 AM]

Rickjb
Veteran Cruncher
Australia
Joined: Sep 17, 2006
Post Count: 666
Status: Offline
Project Badges:

14 day badge for Human Proteome Folding - Phase 2

1 year badge for Discovering Dengue Drugs - Together

45 day badge for Nutritious Rice for the World

90 day badge for The Clean Energy Project

1 year badge for Help Fight Childhood Cancer

180 day badge for Influenza Antiviral Drug Search

1 year badge for Help Cure Muscular Dystrophy - Phase 2

1 year badge for Discovering Dengue Drugs - Together - Phase 2

10 year badge for The Clean Energy Project - Phase 2

1 year badge for Drug Search for Leishmaniasis

10 year badge for GO Fight Against Malaria

50 year badge for Outsmart Ebola Together

50 year badge for FightAIDS@Home - Phase 2

50 year badge for OpenPandemics - COVID-19


Re: Server Errors.

branjo: "I have 1 day cache, but it seems to be not enough. 1.5 - 2 days is reasonable"
My "farm" got through the recent server outage with about 1 hour's work to spare on a cache setting of 1.4-1.5d.
I can remember 1 other occasion of a long server outage, and that lasted several days. It was over the christmas holiday period in 2006 or 07, when the servers were located in Boulder, CO. There was a server crash during a blizzard and techs could not get physical access to the machines. You needed a cache setting of about 3 days to get through that one.

I think it's interesting that WCG found limitations in IBM's commercial GPFS product. It's an indication of just how big "we" are. Better for this to happen in-house than with an external customer, I guess. I expect that the problems will be reported back to IBM HQ, who will make changes to their product. Meanwhile, WCG have not said that they're now running regular scans for large directory-files that have a high proportion of entries for deleted files. They've added more RAM to the servers, but unless there's a lid on the size of the problem they may hit the RAM limit again at some stage. Also, allowing large sparse directories probably increases the amount of CPU time spent searching them and may noticeably degrade system performance.

Comments?

[Sep 6, 2012 2:22:32 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Server Errors.

Surely you understand, that WCG is a R&D project to IBM at the same time... things learned here doubtlessly flow back into their knowledge base and products/services quality. Good for them.

[Sep 6, 2012 8:38:49 AM]

[ ]