| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 16
|
|
| Author |
|
|
Cyclops
Senior Cruncher Joined: Jun 13, 2022 Post Count: 295 Status: Offline |
Hi everyone, an update on network connection and storage.
----------------------------------------We are working together with SHARCNET (an HPC site where WCG servers and storage reside) to resolve the network congestion events we have been experiencing. For volunteers, these events manifest as the arbitrary website/forums database downtime and constant interruptions to volunteers attempting to download workunits. At this time, we believe the root cause to be a limitation or bug in the OpenStack software through which our virtual environment is provisioned at SHARCNET. To help ameliorate the worst effects of this issue, SHARCNET have re-routed all WCG traffic through a new network node. Effectively, this separates WCG traffic from that of other users and deployments unrelated to the WCG that are colocated at the SHARCNET HPC facility. We have already seen a benefit from this change, and it could help us to further diagnose and optimize additional performance issues. We have also reduced the maximum concurrent connections permitted on the download servers at SHARCNET’s request, and reduced the maximum number of packages available at any one time for download. Although these adjustments suggest a lower throughput, they have been active since November 11 and are in fact helping the overall throughput of WCG by stabilizing the network to a degree. However, we are still seeing events inside our environment where the load balancer and servers behind it are sometimes unable to communicate with each other. Importantly, the bandwidth that the WCG environment is provided with at SHARCNET is nowhere near saturated during these events. It is not an issue of capacity. We are working to resolve this and will provide an update on our progress as soon as we have new information. Once resolved, we will be in a position to fully restart, and bring new projects to the Grid. The new and faster storage server is physically installed at SHARCNET now and will be connected to the rest of the WCG servers next week. The primary benefit of the new storage array is the SSD storage that comes with it, which will increase performance of many key components that currently rely on NFS shares of logical volumes that are composed of HDD storage only. If you have any comments or questions, please leave them in this thread for us to answer. Thank you for your support, patience and understanding. WCG team at Krembil Research Institute [Edit 1 times, last edit by Cyclops at Nov 18, 2022 6:08:08 PM] |
||
|
|
Just Jake
Cruncher Joined: Nov 15, 2018 Post Count: 24 Status: Offline Project Badges:
|
The backlog I used to see for downloads hasn't occurred recently and whatever WCG/SHARCNET have currently operating is keeping my 18 cores and 36 threads fully fed and crunching away merrily. Thanks Team!
|
||
|
|
Paul Schlaffer
Senior Cruncher USA Joined: Jun 12, 2005 Post Count: 278 Status: Offline Project Badges:
|
This is the best and most detailed update yet. It's also a level of detail I think most of us here want to see. Thank you for the update.
----------------------------------------
“Where an excess of power prevails, property of no sort is duly respected. No man is safe in his opinions, his person, his faculties, or his possessions.” – James Madison (1792)
|
||
|
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7846 Status: Offline Project Badges:
|
Thank you for the update. I am glad to see this level of detail, especially about the problems and their potential solutions.
----------------------------------------Just for good measure, I have not seen any download problems lately. Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
|
bluestang
Senior Cruncher USA Joined: Oct 1, 2010 Post Count: 274 Status: Offline Project Badges:
|
Who thought putting WCG on HDDs and not SSDs was a good idea?
----------------------------------------Sorry, but SHARCNET should have known better than that! |
||
|
|
nivrip
Senior Cruncher North Yorkshire Joined: Sep 13, 2007 Post Count: 285 Status: Offline Project Badges:
|
Update much appreciated. Just like to say that the up and downloading seems to have been trouble free over the last week. Long may it continue.
----------------------------------------However, I am seeing very few GPU WUs over this period.
ЮРКШИР КРУНЧЕР
|
||
|
|
binventive
Cruncher Joined: May 3, 2007 Post Count: 13 Status: Offline Project Badges:
|
This update is exactly what I have been hoping for in terms of specific details regarding how issues are being dealt with in the short and long-term.
----------------------------------------Over the past week or so, I have not noticed any stuck/repeated downloads or intermittent unavailability of tasks. I am hoping this trend continues!
----------------------------------------
![]() |
||
|
|
Vester
Senior Cruncher USA Joined: Nov 18, 2004 Post Count: 325 Status: Offline Project Badges:
|
Thank you for the update.
----------------------------------------GPU tasks need to be longer running. Ten tasks last about eight minutes on my computer. ![]() |
||
|
|
TheRealYeti
Cruncher Germany Joined: Nov 19, 2005 Post Count: 19 Status: Offline Project Badges:
|
Thank you for the update. GPU tasks need to be longer running. Ten tasks last about eight minutes on my computer. Yeah, I think so, but it would only make sense, if this would reduce download-volume ![]() Supporting BOINC, a great concept ! |
||
|
|
Bryn Mawr
Senior Cruncher Joined: Dec 26, 2018 Post Count: 384 Status: Recently Active Project Badges:
|
Thank you for the update. GPU tasks need to be longer running. Ten tasks last about eight minutes on my computer. But then, each one takes about 4 hours on mine :-) |
||
|
|
|