Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 27
Posts: 27   Pages: 3   [ 1 2 3 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 182281 times and has 26 replies Next Thread
WCGAdmin
World Community Grid Admin
Joined: Jun 9, 2020
Post Count: 171
Status: Offline
Reply to this Post  Reply with Quote 
Update: July 25 system outage and defective OPNG work units

The system outage experienced due to scheduled maintenance is now complete. We are aware of problems with OPNG work units and are investigating this issue.

https://www.worldcommunitygrid.org/about_us/article.s?articleId=799
[Jul 27, 2023 6:40:06 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Bryn Mawr
Senior Cruncher
Joined: Dec 26, 2018
Post Count: 345
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Update: July 25 system outage and defective OPNG work units

The system outage experienced due to scheduled maintenance is now complete. We are aware of problems with OPNG work units and are investigating this issue.

https://www.worldcommunitygrid.org/about_us/article.s?articleId=799


Can you please confirm whether you have paused the defective OPNG WUs or ALL WUs?

I am out of work reporting no tasks available.
[Jul 27, 2023 7:13:34 PM]   Link   Report threatening or abusive post: please login first  Go to top 
TigerLily
Senior Cruncher
Joined: May 26, 2023
Post Count: 280
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Update: July 25 system outage and defective OPNG work units

Hi Bryn,

The tech team has paused distribution of work units for SCC1, OPN1, and OPNG until they identify what the issue is. MCM work units should still be available but I have forwarded your post to the team to investigate what the problem might be.
[Jul 27, 2023 7:30:33 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Aperture_Science_Innovators
Advanced Cruncher
United States
Joined: Jul 6, 2009
Post Count: 139
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Update: July 25 system outage and defective OPNG work units

Hi Bryn,

The tech team has paused distribution of work units for SCC1, OPN1, and OPNG until they identify what the issue is. MCM work units should still be available but I have forwarded your post to the team to investigate what the problem might be.

Thanks for checking with the techs. The only WUs I've gotten in about four hours are ~10 MCM resends, nothing new.
----------------------------------------

[Jul 27, 2023 8:08:19 PM]   Link   Report threatening or abusive post: please login first  Go to top 
hchc
Veteran Cruncher
USA
Joined: Aug 15, 2006
Post Count: 802
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Update: July 25 system outage and defective OPNG work units

All I'm seeing is a bunch of "Server is out of disk space" messages, so nothing is uploading.
----------------------------------------
  • i5-7500 (Kaby Lake, 4C/4T) @ 3.4 GHz
  • i5-4590 (Haswell, 4C/4T) @ 3.3 GHz
  • i5-3570 (Broadwell, 4C/4T) @ 3.4 GHz

[Jul 28, 2023 2:00:02 AM]   Link   Report threatening or abusive post: please login first  Go to top 
hchc
Veteran Cruncher
USA
Joined: Aug 15, 2006
Post Count: 802
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Update: July 25 system outage and defective OPNG work units

Is there any word on the major incident that happened starting last week that took all systems down (WCG BOINC, WCG website, WCG forums) even before the planned maintenance period?

It's not safe that all of those systems tend to go down at the exact same time since they should (ideally) be pretty isolated from one another.

I tried checking Twitter for updates, but apparently Twitter requires an account to even see public posts now, which is inconvenient. I block Facebook in NoScript for privacy reasons but used another web browser to check WCG on Facebook, and there weren't any updates. Just a lot of people really frustrated at the lack of communication for 4-5 days and then one troll/disturbed person fighting a lot of people. It was a big turn-off.

I've worked in IT for quite a few years, and any time a Sev 1 ("Severity" aka Priority 1) major incident happens, there are always Service Level Agreements (SLAs) to be met and people are paged out 24/7 to fix the incident and provide updates and documentation to both management and any stakeholders (executives and customers affected). Even if I was busy and stressed figuring out an issue, I still had to at least update everyone on what on earth was happening and what we are presently doing to fix it and a rough ETA if we had one. If I ghosted end users for hours let alone 4-5 days, I'd be fired.

Edited to Add: I'm understanding that because WCG doesn't have financial resources to support the staffing levels it needs (like a small team), it's comparing apples to oranges. I think we understand that and empathize with the workload. And to be fair, while we volunteers do contribute lots of expensive, free computing resources, it's not the same as a revenue-generating corporation or government where downtime immediately affects the bottom line of the organization. I think the biggest frustration is just lack of communication and updates, which doesn't take lots of $$$ at all.
----------------------------------------
  • i5-7500 (Kaby Lake, 4C/4T) @ 3.4 GHz
  • i5-4590 (Haswell, 4C/4T) @ 3.3 GHz
  • i5-3570 (Broadwell, 4C/4T) @ 3.4 GHz

----------------------------------------
[Edit 7 times, last edit by hchc at Jul 28, 2023 7:04:35 AM]
[Jul 28, 2023 2:06:44 AM]   Link   Report threatening or abusive post: please login first  Go to top 
roundup
Veteran Cruncher
Switzerland
Joined: Jul 25, 2006
Post Count: 835
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Update: July 25 system outage and defective OPNG work units

The system outage experienced due to scheduled maintenance is now complete. We are aware of problems with OPNG work units and are investigating this issue.

https://www.worldcommunitygrid.org/about_us/article.s?articleId=799

No word on the reasons for the outages before the scheduled maintenance?
No word on the upload still not working properly?

We have many WCG crunchers on our team who have been contributing regularly since the mid-2010s. From these members we hear a similar frustration to that described by hchc here.
It is neither expensive nor time-consuming to send out regular notifications on social media about system bugs, with a brief explanation of the problem and an estimate of how long it will take to fix.
You have to understand that the volunteers want to be appreciated, because they invest time and money (hardware, electricity) in WCG. Compared to other worthwhile projects, WCG / Krembil is clearly behind in the quality of communication. In the long run, the consequence will be that even more computing power will be taken away from WCG.
Please take this as a friendly feedback and suggestion for improvement. Thank you very much.
[Jul 28, 2023 4:46:36 AM]   Link   Report threatening or abusive post: please login first  Go to top 
thunder7
Senior Cruncher
Netherlands
Joined: Mar 6, 2013
Post Count: 232
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Update: July 25 system outage and defective OPNG work units

Hi Bryn,

The tech team has paused distribution of work units for SCC1, OPN1, and OPNG until they identify what the issue is. MCM work units should still be available but I have forwarded your post to the team to investigate what the problem might be.


So, any news on those disappearing MCM workunits?
[Jul 28, 2023 1:29:33 PM]   Link   Report threatening or abusive post: please login first  Go to top 
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 1951
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Update: July 25 system outage and defective OPNG work units

Hi Bryn,

The tech team has paused distribution of work units for SCC1, OPN1, and OPNG until they identify what the issue is. MCM work units should still be available but I have forwarded your post to the team to investigate what the problem might be.


So, any news on those disappearing MCM workunits?
Well, she at least is apparently under the impression that regarding MCM1 everything is still peachy fine.
They might have to forward this to the tech team first to investigate (though they got the first messages about this more than a week ago, before the latest crash&burn, but hey, better to censor supposedly unruly users than actually react to any notification that there are once again problems rising up).


Ralf
----------------------------------------

[Jul 28, 2023 4:14:16 PM]   Link   Report threatening or abusive post: please login first  Go to top 
TigerLily
Senior Cruncher
Joined: May 26, 2023
Post Count: 280
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Update: July 25 system outage and defective OPNG work units



So, any news on those disappearing MCM workunits?

Hi thunder7,

An MCM1 work unit update was just posted.
[Jul 28, 2023 6:34:30 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 27   Pages: 3   [ 1 2 3 | Next Page ]
[ Jump to Last Post ]
Post new Thread