Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 29
Posts: 29   Pages: 3   [ 1 2 3 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 106620 times and has 28 replies Next Thread
Cyclops
Senior Cruncher
Joined: Jun 13, 2022
Post Count: 295
Status: Offline
Reply to this Post  Reply with Quote 
2022-11-10 Update (New Storage & Weekly Results)

Hi everyone,

As described earlier, WCG had transitioned from using IBM cloud infrastructure to our physical servers hosted at the University of Waterloo and supported by the Sharcnet HPC facility. Thus, the “migration” process required re-building the WCG system on a different hardware. Unfortunately, performance and capacity of our system is lower compared to IBM cloud setup. While extensive benchmarking was done to confirm it is sufficient and that the hard drive storage system would perform at least adequately for the time being, we know it is not sufficient going forward and thus we continue searching for partners and resources for upgrading our servers and the storage system. Many of the failures, errors and challenges we encountered over the transition time required continuous tweaking of the system to ensure it does not choke with increased volume of workunits or number of volunteers.

It is with extreme excitement that we can announce that Sharcnet has helped us in obtaining a new storage with sufficient SSD capacity and speed to be used by WCG. The new storage should substantially improve database and scheduler performance and overall improve throughput of the workunits management system and database servers. Once operational, we will optimize our system configuration and test it before putting it into production. We will keep you updated on the timeline of implementing this upgrade.

In the meantime, we would like to thank our most valuable, “alpha testers” volunteers, as without you we would not be able to finalize the system and start producing research results for the current projects. We recognize that some projects have been given more workunits to crunch than others, and we are working to equalize the distribution. ARP project is starting again with more workunits available soon and HSTB is going to re-start in the coming weeks.





If you have any questions, please leave them in this forum thread. Thank you for your support, patience and understanding.

WCG team at Krembil Research Institute
----------------------------------------
[Edit 1 times, last edit by Cyclops at Nov 10, 2022 9:36:35 PM]
[Nov 10, 2022 9:30:26 PM]   Link   Report threatening or abusive post: please login first  Go to top 
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 2173
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2022-11-10 Update (New Storage & Weekly Results)

Well, well, well....

For one, I thnk it would be more appropriate to tone down that prep talk a bit. After all the failures for the better part of the year, for it taking months with very little effective gain, this just doesn't sound sincere anymore.

Communication is still something that needs to improve, and that doesn't really depend on any new storage or benchmarking.

And instead of showing the number of WUs and volunteer provided CPU years are nice, I would be at this point, after several frustrating months, much rather see something that would indicate that the number of download errors and retries is effectively going down. So far, it seems that curve would be parallel to the ones you provided so far, with the same upwards trend.

Also kind of interesting to know who those "most valuable alpha testers" would be... confused

Ralf
[Nov 10, 2022 10:22:22 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Greg_BE
Advanced Cruncher
Joined: May 9, 2016
Post Count: 124
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2022-11-10 Update (New Storage & Weekly Results)

So what is the difference between SHARCNET HPC and SHARCNET (solo). Is SHARCNET going to host you or give you hardware?

You know what you are trying to say, but to us its just a lot of generalities.

What I want to know, how you are going to solve this htto: transient error issue.
Will SHARCNET eliminate that or is that something that is happening with your current limited physical system?

Are you running on a physical system there at Krembell or are you working off of some hybrid with SHARCNET and your physical system?
[Nov 10, 2022 11:22:27 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 1294
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2022-11-10 Update (New Storage & Weekly Results)

Thank you so much for taking WCG, and not letting it die. I'm so excited you found a solution. I look forward to crunching even more WUs !
I thank any alpha testers who helped work out the bugs, so the rest of us can have a better system.

Thanks for the update !
[Nov 10, 2022 11:50:58 PM]   Link   Report threatening or abusive post: please login first  Go to top 
hchc
Veteran Cruncher
USA
Joined: Aug 15, 2006
Post Count: 865
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2022-11-10 Update (New Storage & Weekly Results)

It's telling that none of the dedicated WCG sysadmins, network admins, database admins, etc. are using the forums to communicate directly with us.

Question: How fast is WCG's Internet connection? Because I'm getting 300 Kb/sec download speed when downloading the ~100 MB MCM sarcoma data file, which takes several minutes. I ask because if the throughput is this slow with a minimal number of volunteers hitting WCG, it means that WCG is already maxed out as far as Internet connection speed.
----------------------------------------
  • i5-7500 (Kaby Lake, 4C/4T) @ 3.4 GHz
  • i5-4590 (Haswell, 4C/4T) @ 3.3 GHz
  • i5-3570 (Broadwell, 4C/4T) @ 3.4 GHz

----------------------------------------
[Edit 1 times, last edit by hchc at Nov 11, 2022 12:55:09 AM]
[Nov 11, 2022 12:38:18 AM]   Link   Report threatening or abusive post: please login first  Go to top 
hchc
Veteran Cruncher
USA
Joined: Aug 15, 2006
Post Count: 865
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2022-11-10 Update (New Storage & Weekly Results)



Little confusing the "Total" (blue) curve is the one at the very bottom.
----------------------------------------
  • i5-7500 (Kaby Lake, 4C/4T) @ 3.4 GHz
  • i5-4590 (Haswell, 4C/4T) @ 3.3 GHz
  • i5-3570 (Broadwell, 4C/4T) @ 3.4 GHz

[Nov 11, 2022 12:56:36 AM]   Link   Report threatening or abusive post: please login first  Go to top 
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 2173
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2022-11-10 Update (New Storage & Weekly Results)

Question: How fast is WCG's Internet connection? Because I'm getting 300 Kb/sec download speed when downloading the ~100 MB MCM sarcoma data file, which takes several minutes.
The last time I watched this file downloading, it came in for me with 1.9MB/sec, taking a little bit over a minute (and I am on a business 500/500MBit/sec connection)...

Ralf
[Nov 11, 2022 1:33:15 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Paul Schlaffer
Senior Cruncher
USA
Joined: Jun 12, 2005
Post Count: 278
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2022-11-10 Update (New Storage & Weekly Results)

Thank you for the update. It's good to see the project is getting some new equipment that many here suspected was needed.

If funding is needed to acquire equipment or additional bandwidth, that need/goal should be clearly articulated in the Donate section of the website, and announced here in News. Then those who want to help towards that end can do so. However transparency, especially a grid project like this which involves a large number of people, is key.
----------------------------------------
“Where an excess of power prevails, property of no sort is duly respected. No man is safe in his opinions, his person, his faculties, or his possessions.” – James Madison (1792)
----------------------------------------
[Edit 1 times, last edit by Paul Schlaffer at Nov 11, 2022 4:00:05 AM]
[Nov 11, 2022 3:58:27 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7846
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2022-11-10 Update (New Storage & Weekly Results)

Thank you for the update.

Unfortunately, performance and capacity of our system is lower compared to IBM cloud setup. While extensive benchmarking was done to confirm it is sufficient and that the hard drive storage system would perform at least adequately for the time being, we know it is not sufficient going forward and thus we continue searching for partners and resources for upgrading our servers and the storage system.


As with many things "sufficient" does not cut it. Benchmarking is a synthetic value which sometimes does not correspond with real world conditions. As an analogy, I will give the example of a restaurant. If you look at the place at 9:00 AM you would think the resources available are sufficient. But if you look at the 12:00 PM those same resources are no longer sufficient because the lunchtime crowd has arrived. What looks like it will work on paper does not always work in practice.
I am reminded of an old design engineer I knew who worked for an industrial supply manufacturer. They would rate their products to work at 100% for specified conditions, but they would engineer them at 250% for those same conditions because they were proud of the reliability of their products. In other words, perhaps the benchmarking should have been aimed at sufficient plus 100%.
Hopefully as the staff there gain experience this will happen.
I wish you good luck in finding the partners and resources for the upgrades on your hardware.

Cheers
Edit:spelling
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
----------------------------------------
[Edit 1 times, last edit by Sgt.Joe at Nov 11, 2022 1:41:22 PM]
[Nov 11, 2022 4:16:21 AM]   Link   Report threatening or abusive post: please login first  Go to top 
thunder7
Senior Cruncher
Netherlands
Joined: Mar 6, 2013
Post Count: 238
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2022-11-10 Update (New Storage & Weekly Results)


Also kind of interesting to know who those "most valuable alpha testers" would be... confused

Ralf


I still can't shake the feeling WE all are the alpha testers.
[Nov 11, 2022 5:26:58 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 29   Pages: 3   [ 1 2 3 | Next Page ]
[ Jump to Last Post ]
Post new Thread