Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 16
Posts: 16   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 12313 times and has 15 replies Next Thread
Cyclops
Senior Cruncher
Joined: Jun 13, 2022
Post Count: 295
Status: Offline
Reply to this Post  Reply with Quote 
2022-11-18 Update (Network and Storage)

Hi everyone, an update on network connection and storage.

We are working together with SHARCNET (an HPC site where WCG servers and storage reside) to resolve the network congestion events we have been experiencing. For volunteers, these events manifest as the arbitrary website/forums database downtime and constant interruptions to volunteers attempting to download workunits. At this time, we believe the root cause to be a limitation or bug in the OpenStack software through which our virtual environment is provisioned at SHARCNET.

To help ameliorate the worst effects of this issue, SHARCNET have re-routed all WCG traffic through a new network node. Effectively, this separates WCG traffic from that of other users and deployments unrelated to the WCG that are colocated at the SHARCNET HPC facility. We have already seen a benefit from this change, and it could help us to further diagnose and optimize additional performance issues.

We have also reduced the maximum concurrent connections permitted on the download servers at SHARCNET’s request, and reduced the maximum number of packages available at any one time for download. Although these adjustments suggest a lower throughput, they have been active since November 11 and are in fact helping the overall throughput of WCG by stabilizing the network to a degree. However, we are still seeing events inside our environment where the load balancer and servers behind it are sometimes unable to communicate with each other.

Importantly, the bandwidth that the WCG environment is provided with at SHARCNET is nowhere near saturated during these events. It is not an issue of capacity. We are working to resolve this and will provide an update on our progress as soon as we have new information. Once resolved, we will be in a position to fully restart, and bring new projects to the Grid.

The new and faster storage server is physically installed at SHARCNET now and will be connected to the rest of the WCG servers next week. The primary benefit of the new storage array is the SSD storage that comes with it, which will increase performance of many key components that currently rely on NFS shares of logical volumes that are composed of HDD storage only.

If you have any comments or questions, please leave them in this thread for us to answer. Thank you for your support, patience and understanding.

WCG team at Krembil Research Institute
----------------------------------------
[Edit 1 times, last edit by Cyclops at Nov 18, 2022 6:08:08 PM]
[Nov 18, 2022 6:00:01 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Just Jake
Cruncher
Joined: Nov 15, 2018
Post Count: 24
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2022-11-18 Update (Network and Storage)

The backlog I used to see for downloads hasn't occurred recently and whatever WCG/SHARCNET have currently operating is keeping my 18 cores and 36 threads fully fed and crunching away merrily. Thanks Team!
[Nov 18, 2022 7:06:35 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Paul Schlaffer
Senior Cruncher
USA
Joined: Jun 12, 2005
Post Count: 278
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2022-11-18 Update (Network and Storage)

This is the best and most detailed update yet. It's also a level of detail I think most of us here want to see. Thank you for the update.
----------------------------------------
“Where an excess of power prevails, property of no sort is duly respected. No man is safe in his opinions, his person, his faculties, or his possessions.” – James Madison (1792)
[Nov 19, 2022 12:39:10 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7846
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2022-11-18 Update (Network and Storage)

Thank you for the update. I am glad to see this level of detail, especially about the problems and their potential solutions.
Just for good measure, I have not seen any download problems lately.
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Nov 19, 2022 2:27:11 AM]   Link   Report threatening or abusive post: please login first  Go to top 
bluestang
Senior Cruncher
USA
Joined: Oct 1, 2010
Post Count: 274
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2022-11-18 Update (Network and Storage)

Who thought putting WCG on HDDs and not SSDs was a good idea?

Sorry, but SHARCNET should have known better than that!
----------------------------------------
[Nov 19, 2022 4:08:41 AM]   Link   Report threatening or abusive post: please login first  Go to top 
nivrip
Senior Cruncher
North Yorkshire
Joined: Sep 13, 2007
Post Count: 285
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2022-11-18 Update (Network and Storage)

Update much appreciated. Just like to say that the up and downloading seems to have been trouble free over the last week. Long may it continue.

However, I am seeing very few GPU WUs over this period.
----------------------------------------
ЮРКШИР КРУНЧЕР
[Nov 19, 2022 11:51:06 AM]   Link   Report threatening or abusive post: please login first  Go to top 
binventive
Cruncher
Joined: May 3, 2007
Post Count: 13
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2022-11-18 Update (Network and Storage)

This update is exactly what I have been hoping for in terms of specific details regarding how issues are being dealt with in the short and long-term.

Over the past week or so, I have not noticed any stuck/repeated downloads or intermittent unavailability of tasks. I am hoping this trend continues!
----------------------------------------
----------------------------------------

[Nov 19, 2022 1:18:53 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Vester
Senior Cruncher
USA
Joined: Nov 18, 2004
Post Count: 325
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2022-11-18 Update (Network and Storage)

Thank you for the update.

GPU tasks need to be longer running. Ten tasks last about eight minutes on my computer.
----------------------------------------

[Nov 19, 2022 3:24:00 PM]   Link   Report threatening or abusive post: please login first  Go to top 
TheRealYeti
Cruncher
Germany
Joined: Nov 19, 2005
Post Count: 19
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2022-11-18 Update (Network and Storage)

Thank you for the update.

GPU tasks need to be longer running. Ten tasks last about eight minutes on my computer.


Yeah, I think so, but it would only make sense, if this would reduce download-volume
----------------------------------------


Supporting BOINC, a great concept !
[Nov 19, 2022 4:19:50 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Bryn Mawr
Senior Cruncher
Joined: Dec 26, 2018
Post Count: 384
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2022-11-18 Update (Network and Storage)

Thank you for the update.

GPU tasks need to be longer running. Ten tasks last about eight minutes on my computer.


But then, each one takes about 4 hours on mine :-)
[Nov 19, 2022 7:57:18 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 16   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread