Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Official Messages Forum: News Thread: 2022-09-15 Update (Networking & Workunits) |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 214
|
Author |
|
Robokapp
Senior Cruncher Joined: Feb 6, 2012 Post Count: 248 Status: Offline Project Badges: |
But yes, people mashing the update button doesn't help the system, and it's an incredibly wasteful use of one's time. scripts and autoclickers do the mashing. I doubt any noteworthy ammount of retries is done by hand. sadly they also mash when no mash is needed, further stressing the server. |
||
|
marbesoz
Cruncher Joined: Jul 4, 2020 Post Count: 8 Status: Offline |
In a testing phase after six months? we cannot continue with the "politically correct", it is time to say that IBM has transferred WCG to a structure that lacks adequate competence, I am very angry ...
|
||
|
Grumpy Swede
Master Cruncher Svíþjóð Joined: Apr 10, 2020 Post Count: 2089 Status: Offline Project Badges: |
Well, according to external stats sites like BoincStats, WCG has only had 16156 users, with any credits at all the last 24 hours. That is nothing compared to before the transfer to Krembil/Jurisica Lab. So if WCG under Krembil/Jurisica Lab, with only 16156 users, have such enormous problems, just think about what will happen if the rest of the WCG users decides to return. 16156 is just a fraction of the amount of users WCG had under IBM.
----------------------------------------Even if they fix the problems they have now, with downloads, they will come to a screeching halt, if even a small amount of WCG users from before the transition returns. This is an infrastructure problem for sure. They need more real iron, and not just more VM's. I also really question if sharcnet really have enough bandwidth to be able to host WCG. This is certainly not looking good. This migration was certainly not well thought out, and I will not be surprised at all, if Krembil decides to tell Jurisica Lab, to just pull the plug on WCG. [Edit 2 times, last edit by Grumpy Swede at Sep 26, 2022 2:46:17 PM] |
||
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 1932 Status: Offline Project Badges: |
Well, according to external stats sites like BoincStats, WCG has only had 16156 users, with any credits at all the last 24 hours. That is nothing compared to before the transfer to Krembil/Jurisica Lab. So if WCG under Krembil/Jurisica Lab, with only 16156 users, have such enormous problems, just think about what will happen if the rest of the WCG users decides to return. 16156 is just a fraction of the amount of users WCG had under IBM. I don't remember now the exact number, but the number of ACTIVE users before the move I think was around 80,000. So we at are around 20-25% of the previous number of active users. And with taking pretty much a year to get things going again, I am not sure if we will get to those pre-move numbers (ever) again...This is an infrastructure problem for sure. They need more real iron, and not just more VM's. I also really question if sharcnet really have enough bandwidth to be able to host WCG. I am not sure if you have followed previous posts by me (and Alan and a couple of others) and the current issues are certainly not a "bandwidth" issue, but rather a connection issue. Those two things are decisively different and the later is a thing that needs to be solved at Krembil's end, it's not an ISP issue. The post from Christian ("cubes") kind of confirmed this, and his reply has been so far one first one that makes me think they are finally on the right track...Haven't seen any OPNG WUs since I got up this morning (there were a couple of batches late last night (PDT)), but so far, I have not seen ANY stuck or stalled downloads this morning. But then I might have just jinxed it and the problems might be back as soon as OPNG WUs are again released into the wild... ;-) This is certainly not looking good. This migration was certainly not well thought out, No argument from me here. I think a lot of the current/recent problems could have been prevented/minimized in the months between the announcement and the move, as well as in the months before they started to bring things back online. (Certificates!)and I will not be surprised at all, if Krembil decides to tell Jurisica Lab, to just pull the plug on WCG. I seriously hope not. Beside that would be VERY bad publicity for Krembil and I am not sure if that is something they can really afford to do. It would cast a serious doubt on their capability as a research institute...Ralf |
||
|
Paul Schlaffer
Senior Cruncher USA Joined: Jun 12, 2005 Post Count: 242 Status: Offline Project Badges: |
Well, according to external stats sites like BoincStats, WCG has only had 16156 users, with any credits at all the last 24 hours. That is nothing compared to before the transfer to Krembil/Jurisica Lab. So if WCG under Krembil/Jurisica Lab, with only 16156 users, have such enormous problems, just think about what will happen if the rest of the WCG users decides to return. 16156 is just a fraction of the amount of users WCG had under IBM. Even if they fix the problems they have now, with downloads, they will come to a screeching halt, if even a small amount of WCG users from before the transition returns. This is an infrastructure problem for sure. They need more real iron, and not just more VM's. I also really question if sharcnet really have enough bandwidth to be able to host WCG. This is certainly not looking good. This migration was certainly not well thought out, and I will not be surprised at all, if Krembil decides to tell Jurisica Lab, to just pull the plug on WCG. For the latter part, I sure hope not. I am grateful they were willing to take this on after IBM decided they were done. I've been part of this charitable project from the beginning. However, it's clear they have been over their heads with this project. If it does at some point move on, it should go to someone like Elon Musk, who would have the resources to regrow the project and make it succeed. There's certainly no argument we've lost a lot of contributors due to the many months of downtime, and the buggy restart. “Where an excess of power prevails, property of no sort is duly respected. No man is safe in his opinions, his person, his faculties, or his possessions.” – James Madison (1792) [Edit 2 times, last edit by Paul Schlaffer at Sep 27, 2022 2:47:02 AM] |
||
|
sam6861
Advanced Cruncher Joined: Mar 31, 2020 Post Count: 107 Status: Offline Project Badges: |
Our load balancer runs HAProxy. Proxy, and 65535 TCP connections limit per address... HTTP server may be able to handle millions of incoming connections, but if this is a proxy which forwards all those connection to only 1 or 2 addresses, then with as little as 65535 connections to 1 address or 131070 to 2 addresses, it will run out of TCP ports and have random "no server available" problem. Unsure if this would help, but maybe a possible workaround is to just add more IPv4/IPv6 addresses to upload/download server, maybe with virtual address or something. Slow ARP1 downloads at 50 MegaByte per second have often been a problem as well. Might be either slow network connection or slow server storage, unsure. |
||
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 1932 Status: Offline Project Badges: |
Our load balancer runs HAProxy. Proxy, and 65535 TCP connections limit per address... HTTP server may be able to handle millions of incoming connections, but if this is a proxy which forwards all those connection to only 1 or 2 addresses, then with as little as 65535 connections to 1 address or 131070 to 2 addresses, it will run out of TCP ports and have random "no server available" problem. Unsure if this would help, but maybe a possible workaround is to just add more IPv4/IPv6 addresses to upload/download server, maybe with virtual address or something. Slow ARP1 downloads at 50 MegaByte per second have often been a problem as well. Might be either slow network connection or slow server storage, unsure. Servers equipped with 6 to 8 cores generally achieve between 200000 and 500000 requests per second, and have no trouble saturating a 25 Gbit/s connection under Linux Ralf |
||
|
Ingleside
Veteran Cruncher Norway Joined: Nov 19, 2005 Post Count: 974 Status: Offline Project Badges: |
Slow ARP1 downloads at 50 MegaByte per second have often been a problem as well. Might be either slow network connection or slow server storage, unsure. If ARP1 downloads really was at 50 Megabyte per second, even the largest file would download in 1 second and where wouldn't be any problems. Unfortunately, in reality ARP1 downloads have been down to around 50 kilobyte per second for any of the 10+ MB files and by using 1+ minute per file this does tie-up the download server for same amount of time. With multiple large files wouldn't be surprised a single ARP1 wu could take close to 10 minutes download-time (not counting all the hours waiting on actually getting a connection). Thankfully it seems ARP1 downloads have greatly improved, since did manage getting a single new wu where the input_d0? files speed was now 1 Megabyte/s - 2.5 Megabyte/s and the largest file speed was roughly 4 Megabyte/s. Now if this is a real improvement due to downloads servers not being swamped with downloading all the tiny 1 KB files for other type of wu's is more difficult to know, since with ARP1 "committed to other platforms" it doesn't really look like where's much ARP1 going out at all. "I make so many mistakes. But then just think of all the mistakes I don't make, although I might." |
||
|
Aperture_Science_Innovators
Advanced Cruncher United States Joined: Jul 6, 2009 Post Count: 139 Status: Offline Project Badges: |
...since with ARP1 "committed to other platforms" it doesn't really look like where's much ARP1 going out at all. Do you set your devices to run mostly/entirely ARP tasks? I have my preferences set to give each device 10-12 ARP tasks a time (probably can up this, given that work is coming in pretty stably so they are getting lots of other WUs too) but they're not having a hard time pulling that down. I'm seeing my most recent work fetch on one of my systems pulled down 4 ARP tasks out of about 25 tasks total, which is not a bad ratio |
||
|
Hans Sveen
Veteran Cruncher Norge Joined: Feb 18, 2008 Post Count: 802 Status: Offline Project Badges: |
Hello!
Since this morning I have got about 100 new wu's mostly OPN1. Also some ARP1 and MCM was downloaded without any extra clicking on retry. While writing this I just got 13 OPNG wu also with no retries, maybe we can see much more light in the end of the long tunnel!?😍 Keep up the good work! With regards, H.Sveen |
||
|
|