Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 214
Posts: 214   Pages: 22   [ Previous Page | 7 8 9 10 11 12 13 14 15 16 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 69936 times and has 213 replies Next Thread
Robokapp
Senior Cruncher
Joined: Feb 6, 2012
Post Count: 248
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2022-09-15 Update (Networking & Workunits)


But yes, people mashing the update button doesn't help the system, and it's an incredibly wasteful use of one's time.


scripts and autoclickers do the mashing. I doubt any noteworthy ammount of retries is done by hand.

sadly they also mash when no mash is needed, further stressing the server.
[Sep 26, 2022 5:16:06 AM]   Link   Report threatening or abusive post: please login first  Go to top 
marbesoz
Cruncher
Joined: Jul 4, 2020
Post Count: 8
Status: Offline
Reply to this Post  Reply with Quote 
Re: 2022-09-15 Update (Networking & Workunits)

In a testing phase after six months? we cannot continue with the "politically correct", it is time to say that IBM has transferred WCG to a structure that lacks adequate competence, I am very angry ...
[Sep 26, 2022 11:44:08 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Grumpy Swede
Master Cruncher
Svíþjóð
Joined: Apr 10, 2020
Post Count: 2089
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2022-09-15 Update (Networking & Workunits)

Well, according to external stats sites like BoincStats, WCG has only had 16156 users, with any credits at all the last 24 hours. That is nothing compared to before the transfer to Krembil/Jurisica Lab. So if WCG under Krembil/Jurisica Lab, with only 16156 users, have such enormous problems, just think about what will happen if the rest of the WCG users decides to return. 16156 is just a fraction of the amount of users WCG had under IBM.

Even if they fix the problems they have now, with downloads, they will come to a screeching halt, if even a small amount of WCG users from before the transition returns.

This is an infrastructure problem for sure. They need more real iron, and not just more VM's. I also really question if sharcnet really have enough bandwidth to be able to host WCG. This is certainly not looking good. This migration was certainly not well thought out, and I will not be surprised at all, if Krembil decides to tell Jurisica Lab, to just pull the plug on WCG.
----------------------------------------
[Edit 2 times, last edit by Grumpy Swede at Sep 26, 2022 2:46:17 PM]
[Sep 26, 2022 2:34:29 PM]   Link   Report threatening or abusive post: please login first  Go to top 
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 1932
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2022-09-15 Update (Networking & Workunits)

Well, according to external stats sites like BoincStats, WCG has only had 16156 users, with any credits at all the last 24 hours. That is nothing compared to before the transfer to Krembil/Jurisica Lab. So if WCG under Krembil/Jurisica Lab, with only 16156 users, have such enormous problems, just think about what will happen if the rest of the WCG users decides to return. 16156 is just a fraction of the amount of users WCG had under IBM.
I don't remember now the exact number, but the number of ACTIVE users before the move I think was around 80,000. So we at are around 20-25% of the previous number of active users. And with taking pretty much a year to get things going again, I am not sure if we will get to those pre-move numbers (ever) again...
This is an infrastructure problem for sure. They need more real iron, and not just more VM's. I also really question if sharcnet really have enough bandwidth to be able to host WCG.
I am not sure if you have followed previous posts by me (and Alan and a couple of others) and the current issues are certainly not a "bandwidth" issue, but rather a connection issue. Those two things are decisively different and the later is a thing that needs to be solved at Krembil's end, it's not an ISP issue. The post from Christian ("cubes") kind of confirmed this, and his reply has been so far one first one that makes me think they are finally on the right track...

Haven't seen any OPNG WUs since I got up this morning (there were a couple of batches late last night (PDT)), but so far, I have not seen ANY stuck or stalled downloads this morning. But then I might have just jinxed it and the problems might be back as soon as OPNG WUs are again released into the wild... ;-)
This is certainly not looking good. This migration was certainly not well thought out,
No argument from me here. I think a lot of the current/recent problems could have been prevented/minimized in the months between the announcement and the move, as well as in the months before they started to bring things back online. (Certificates!)
and I will not be surprised at all, if Krembil decides to tell Jurisica Lab, to just pull the plug on WCG.
I seriously hope not. Beside that would be VERY bad publicity for Krembil and I am not sure if that is something they can really afford to do. It would cast a serious doubt on their capability as a research institute...

Ralf
----------------------------------------

[Sep 26, 2022 6:44:01 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Paul Schlaffer
Senior Cruncher
USA
Joined: Jun 12, 2005
Post Count: 242
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2022-09-15 Update (Networking & Workunits)

Well, according to external stats sites like BoincStats, WCG has only had 16156 users, with any credits at all the last 24 hours. That is nothing compared to before the transfer to Krembil/Jurisica Lab. So if WCG under Krembil/Jurisica Lab, with only 16156 users, have such enormous problems, just think about what will happen if the rest of the WCG users decides to return. 16156 is just a fraction of the amount of users WCG had under IBM.

Even if they fix the problems they have now, with downloads, they will come to a screeching halt, if even a small amount of WCG users from before the transition returns.

This is an infrastructure problem for sure. They need more real iron, and not just more VM's. I also really question if sharcnet really have enough bandwidth to be able to host WCG. This is certainly not looking good. This migration was certainly not well thought out, and I will not be surprised at all, if Krembil decides to tell Jurisica Lab, to just pull the plug on WCG.


For the latter part, I sure hope not. I am grateful they were willing to take this on after IBM decided they were done. I've been part of this charitable project from the beginning. However, it's clear they have been over their heads with this project. If it does at some point move on, it should go to someone like Elon Musk, who would have the resources to regrow the project and make it succeed. There's certainly no argument we've lost a lot of contributors due to the many months of downtime, and the buggy restart.
----------------------------------------

“Where an excess of power prevails, property of no sort is duly respected. No man is safe in his opinions, his person, his faculties, or his possessions.” – James Madison (1792)
----------------------------------------
[Edit 2 times, last edit by Paul Schlaffer at Sep 27, 2022 2:47:02 AM]
[Sep 27, 2022 2:02:18 AM]   Link   Report threatening or abusive post: please login first  Go to top 
sam6861
Advanced Cruncher
Joined: Mar 31, 2020
Post Count: 107
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2022-09-15 Update (Networking & Workunits)

Our load balancer runs HAProxy.

Proxy, and 65535 TCP connections limit per address...

HTTP server may be able to handle millions of incoming connections, but if this is a proxy which forwards all those connection to only 1 or 2 addresses, then with as little as 65535 connections to 1 address or 131070 to 2 addresses, it will run out of TCP ports and have random "no server available" problem. Unsure if this would help, but maybe a possible workaround is to just add more IPv4/IPv6 addresses to upload/download server, maybe with virtual address or something.

Slow ARP1 downloads at 50 MegaByte per second have often been a problem as well. Might be either slow network connection or slow server storage, unsure.
[Sep 27, 2022 3:38:37 AM]   Link   Report threatening or abusive post: please login first  Go to top 
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 1932
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2022-09-15 Update (Networking & Workunits)

Our load balancer runs HAProxy.

Proxy, and 65535 TCP connections limit per address...

HTTP server may be able to handle millions of incoming connections, but if this is a proxy which forwards all those connection to only 1 or 2 addresses, then with as little as 65535 connections to 1 address or 131070 to 2 addresses, it will run out of TCP ports and have random "no server available" problem. Unsure if this would help, but maybe a possible workaround is to just add more IPv4/IPv6 addresses to upload/download server, maybe with virtual address or something.

Slow ARP1 downloads at 50 MegaByte per second have often been a problem as well. Might be either slow network connection or slow server storage, unsure.
Sorry, but you don't know what you are talking about. HAProxy is a proxy specially for load balancers and has no 64k TCP port limitation. To quote:

Servers equipped with 6 to 8 cores generally achieve between 200000 and 500000 requests per second, and have no trouble saturating a 25 Gbit/s connection under Linux


Ralf
----------------------------------------

[Sep 27, 2022 5:52:40 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Ingleside
Veteran Cruncher
Norway
Joined: Nov 19, 2005
Post Count: 974
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2022-09-15 Update (Networking & Workunits)

Slow ARP1 downloads at 50 MegaByte per second have often been a problem as well. Might be either slow network connection or slow server storage, unsure.

If ARP1 downloads really was at 50 Megabyte per second, even the largest file would download in 1 second and where wouldn't be any problems.

Unfortunately, in reality ARP1 downloads have been down to around 50 kilobyte per second for any of the 10+ MB files and by using 1+ minute per file this does tie-up the download server for same amount of time. With multiple large files wouldn't be surprised a single ARP1 wu could take close to 10 minutes download-time (not counting all the hours waiting on actually getting a connection).

Thankfully it seems ARP1 downloads have greatly improved, since did manage getting a single new wu where the input_d0? files speed was now 1 Megabyte/s - 2.5 Megabyte/s and the largest file speed was roughly 4 Megabyte/s.

Now if this is a real improvement due to downloads servers not being swamped with downloading all the tiny 1 KB files for other type of wu's is more difficult to know, since with ARP1 "committed to other platforms" it doesn't really look like where's much ARP1 going out at all.
----------------------------------------


"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."
[Sep 27, 2022 12:44:00 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Aperture_Science_Innovators
Advanced Cruncher
United States
Joined: Jul 6, 2009
Post Count: 139
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2022-09-15 Update (Networking & Workunits)

...since with ARP1 "committed to other platforms" it doesn't really look like where's much ARP1 going out at all.



Do you set your devices to run mostly/entirely ARP tasks? I have my preferences set to give each device 10-12 ARP tasks a time (probably can up this, given that work is coming in pretty stably so they are getting lots of other WUs too) but they're not having a hard time pulling that down. I'm seeing my most recent work fetch on one of my systems pulled down 4 ARP tasks out of about 25 tasks total, which is not a bad ratio
----------------------------------------

[Sep 27, 2022 1:27:45 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Hans Sveen
Veteran Cruncher
Norge
Joined: Feb 18, 2008
Post Count: 802
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2022-09-15 Update (Networking & Workunits)

Hello!
Since this morning I have got about 100 new wu's mostly OPN1.
Also some ARP1 and MCM was downloaded without any extra clicking on retry.

While writing this I just got 13 OPNG wu also with no retries, maybe we can see
much more light in the end of the long tunnel!?😍

Keep up the good work!

With regards,
H.Sveen
[Sep 27, 2022 2:39:05 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 214   Pages: 22   [ Previous Page | 7 8 9 10 11 12 13 14 15 16 | Next Page ]
[ Jump to Last Post ]
Post new Thread