Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 352
Posts: 352   Pages: 36   [ Previous Page | 15 16 17 18 19 20 21 22 23 24 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 30223 times and has 351 replies Next Thread
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1317
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

The Compute Canada status page is reporting that Nibi has a power outage (flagged up 3 days ago, still open...). It appears they are still having some fairly severe "snagging" issues :-(

Cheers - Al.
[Jul 28, 2025 5:04:13 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 1294
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Thanks for the info Al. I missed looking at that page.

The old system and the new system have to share electricity, and that is why the old system was reduced to 25%, but this looks like a further electrical issue.

It gives me something to keep my eye on for signs of progress though.
[Jul 28, 2025 2:52:46 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 1294
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Here is the new update :

July 28, 2025

MAM1 Beta 7.05 is in the final stages of alpha testing, debugging checkpointing issues under the BOINC client, upon release for linux will require GLIBC > 2.29: We forked the ensmallen library, "a high-quality C++ library for non-linear numerical optimization", and with cereal implemented serializable versions of Simulated Annealing (SA) and Particle Swarm Optimization (PSO) algorithms from the ensmallen library to supplement the existing random methods in MCM1 and MAM1.
The new multi-stage MAM1 build compiles and statically links to LibTorch, OpenCV, dlib, and Apache Arrow for Parquet support, providing a state-of-the-art platform for general scientific inquiry and launching MAM1 and future projects. In addition to SVM, we now support RandomForest, AdaBoost, and Neural Networks (MLP/CNN) as classifiers to train and evaluate promising gene signatures identified by heuristic search of candidate gene signatures via SA and PSO.
We have been working towards a CUDA-enabled LibTorch build for the application as well, and only linker errors remain before we can test setting torch::DeviceType to torch::kCUDA in place of torch::kCPU and see if it might be that easy.
----------------------------------------
[Edit 1 times, last edit by Unixchick at Jul 28, 2025 11:13:00 PM]
[Jul 28, 2025 11:11:44 PM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1317
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

When I saw the new update my first reaction as a Linux user was that library version, as there have recently been posts at CPDN about some RedHat-based Linux versions in common use having an older main C library than that needed by recently rebuilt software there!

So I have just sifted through my recent wingmen data to see how many cases of pre-2.3x library versions I could see...

There are still quite a few Debian, Ubuntu and Linux Mint users still on relatively old kernels (and hence older libraries), but there are also some Red Hat Enterprise Linux 8.10 users. That said, they contributed less than 2% of the wingmen results I looked at.

I think that [nearly] all users ought to be able to get up to an acceptable library version; RHEL 9, recent Fedora releases and the latest CentOS seem to be o.k., and anything recent that is loosely based on Debian or Ubuntu should be o.k. too.

So it looks as if that won't be an issue for the majority of Linux users. I have no idea what will happen to FreeBSD users...

That said, it'll be interesting to watch for the error reports when the Betas restart :-)

Cheers - Al.
[Jul 29, 2025 2:04:11 AM]   Link   Report threatening or abusive post: please login first  Go to top 
bfmorse
Senior Cruncher
US
Joined: Jul 26, 2009
Post Count: 442
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

I wish we had a time frame for releasing this round of beta testing.
But I AM grateful for the status report.
Thank You!
[Jul 29, 2025 5:49:19 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 1294
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

We are getting a good flow of MCM and ARP . Good times.

The Nibi power issue is resolved.
----------------------------------------
[Edit 1 times, last edit by Unixchick at Jul 30, 2025 3:34:53 PM]
[Jul 30, 2025 3:14:58 PM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1317
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

I see they had another outage at nibi on the 29th, now resolved. However, I find the following a bit worrying:
We believe the cause of this was an unexpected interaction between high-availability features (quorum of hypervisers) with more than one hardware failure. We think we can prevent similar things from happening in the future.
Still in the snagging period, it seems!

Cheers - Al.

[Edit] I forgot to add that the power outage item was amended to say it cleared on the day it happened...
----------------------------------------
[Edit 1 times, last edit by alanb1951 at Jul 30, 2025 7:29:04 PM]
[Jul 30, 2025 5:09:07 PM]   Link   Report threatening or abusive post: please login first  Go to top 
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 2173
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

We are getting a good flow of MCM and ARP . Good times.

The Nibi power issue is resolved.
Not sure what the power outage issue was, but I can confirm that all my hosts have their queues full, got even a full dozen ARP1 WUs.
Let's see if this holds up for a week or longer, for a change...

Ralf
[Jul 31, 2025 1:28:33 AM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1317
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

And we have BETA test action at last...

I've posted a new thread in the Beta Test forum in the vain hope that users won't generate a multiplicity of threads with lots of repeated questions or information in them -- let's wait and see.

I can only comment about Linux; I suspect that's where they've started off but if anyone sights Windows examples...

I've not had time to have a really good look at what's happened so far, but a quick glance shows run times ranging from about 50 minutes to just over 5 hours. I suspect this batch is by way of a tuning exercise as much as anything else!

I might post something in the Beta thread after I've had a chance to look at the parameters (which, fortunately, are shown as debug information in the stderr file); the new program uses a completely different control file so my existing catch-scripts (familiar with MCM1 and the first MAM1 beta) didn't pick up any configuration information. More programming coming up (sigh...)

Cheers - Al.
[Jul 31, 2025 2:53:48 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 1294
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

We have 3 projects flowing today.

Please post on Al's thread if you are one of the lucky ones to get a MAM Beta. With such a small amount of WUs going out in a burst it is hard to tell when it is flowing or not, but we are in the testing phase and WUs went out within the last 24 hours so I'll leave the status green

ARP is flowing. We are nearing the end of the 149 group, but it is harder to tell as they no longer come in a strict order. I'm guessing they split the WUs into batches for different machines to deliver to us. The question is will they start group 150 right away or wait until Monday??

MCM are flowing well.

On a personal note. I'm now running 4 ARPs at a time and getting ever closer to my 2 year badge.
[Jul 31, 2025 1:54:58 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 352   Pages: 36   [ Previous Page | 15 16 17 18 19 20 21 22 23 24 | Next Page ]
[ Jump to Last Post ]
Post new Thread