Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 16
Posts: 16   Pages: 2   [ Previous Page | 1 2 ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 150627 times and has 15 replies Next Thread
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 1993
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Update: July 21st system outage

So does this mean the outage / maintenance that was planned for the 25th is past, or were you unable to do that because systems were down and will that have to happen at some point in the future?
There was a post by WCGAdmin (who ever that might be) that the scheduled maintenance did finish...

Ralf
----------------------------------------

[Aug 3, 2023 4:42:13 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Rickjb
Veteran Cruncher
Australia
Joined: Sep 17, 2006
Post Count: 666
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Update: July 21st system outage

Thank you for the explanation of the outage, TigerLily, but ...
why was there almost total silence from WCG for about 6 days during the outage?
The WCG internet home page, real or mocked-up standby, was up most of the time.
Could very brief updates not have been placed on that, say daily, just to let us, your research partners, know that WCG was not being closed down permanently?

Personally, I recently made a significant donation to WCG using after-tax dollars, and I think we deserve better treatment.
[Aug 3, 2023 3:28:39 PM]   Link   Report threatening or abusive post: please login first  Go to top 
hchc
Veteran Cruncher
USA
Joined: Aug 15, 2006
Post Count: 837
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Update: July 21st system outage

You are running servers on DHCP addresses? sad confused crying

Well, why am I surprised about this, common good practices that any machine that needs to be reachable, from anywhere, gets a static IP address just don't seem to be known or followed...


Ralf

You're absolutely right, Ralf. Forgot to notice that. My access point, printer, all servers/services are assigned statically even at home. Only client devices get DHCP leases.

To mitigate this from happening, WCG network and server admins can simply make sure all servers/services, routers, switches, load balancers, etc are using static network configs. (Of course, if further changes are made to the infrastructure, that'll break things, naturally.)
Isn't the use of DHCP a consequence of the use of VMs to run services? It's not quite the same as a home or small office network :-) -- something like a network file server might be eligible for a fixed address, but VMs are more akin to client services (even if they are being used to run server-type software). I think WCG are running everything but their network file server on VMs...

Cheers - Al.

Nope, not at all. Physical server or virtual server doesn't matter because they appear to the rest of the world as the same. The best practice when spinning up a new server instance is to work with the network team to make sure it gets put in the right network VLAN and have an available static IP/subnet mask/etc so that DHCP is completely out of the picture.

It's not really a SOHO/SMB practice as you hint at, but avoiding DHCP is even more important in a large enterprise. Every server that I provisioned (which was virtual after, say, 2013 going forward) got a static IP. If/when DHCP went down, the servers didn't get amnesia and forget who they were or where they were. :P

Maybe in highly elastic cloud environments where new instances are spun up and shut down for minutes at a time, there may be more flexibility, but there'd be more software defined networking controllers who at least coordinated how to reach all the new virtual servers/instances/containers. DHCP wouldn't be a point of failure.

In my opinion, best practice is that DHCP is really for temporary, ephemeral devices like end user/client devices that come and go. So doing more network design and planning upfront and using static configs eliminates such headaches and outages down the road, as we saw.
----------------------------------------
  • i5-7500 (Kaby Lake, 4C/4T) @ 3.4 GHz
  • i5-4590 (Haswell, 4C/4T) @ 3.3 GHz
  • i5-3570 (Broadwell, 4C/4T) @ 3.4 GHz

----------------------------------------
[Edit 2 times, last edit by hchc at Aug 3, 2023 8:53:04 PM]
[Aug 3, 2023 8:50:25 PM]   Link   Report threatening or abusive post: please login first  Go to top 
markfw
Cruncher
Joined: Oct 13, 2016
Post Count: 22
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Update: July 21st system outage

When are the stats exports going to be fixed ? So that free-dc.org can read them ?
[Aug 3, 2023 9:53:36 PM]   Link   Report threatening or abusive post: please login first  Go to top 
TigerLily
Senior Cruncher
Joined: May 26, 2023
Post Count: 280
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Update: July 21st system outage

Hi markfw,

Our tech team has been communicating with our contact from the data centre where our servers reside to try and address this issue. They are working to investigate and resolve the problem, but unfortunately I am unable to provide a time frame at this point.
[Aug 4, 2023 8:25:54 PM]   Link   Report threatening or abusive post: please login first  Go to top 
markfw
Cruncher
Joined: Oct 13, 2016
Post Count: 22
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Update: July 21st system outage

Thank you very much for the update.
[Aug 4, 2023 10:23:35 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 16   Pages: 2   [ Previous Page | 1 2 ]
[ Jump to Last Post ]
Post new Thread