Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 567
Posts: 567   Pages: 57   [ Previous Page | 5 6 7 8 9 10 11 12 13 14 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 42066 times and has 566 replies Next Thread
Grumpy Swede
Master Cruncher
Svíþjóð
Joined: Apr 10, 2020
Post Count: 2493
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Yes, the initial website problems seems to be fixed. Let's see if they can put the 43 .png files in the right directory. They do not exist where BOINC tries to download them from (yet).
Example for one of them: https://download.worldcommunitygrid.org/boinc/slideshow/mcm1_01_v01.png
Gives the following error: Not Found
The requested URL was not found on this server.


The whole WCG team are champions in my book. No matter what other participants says.
----------------------------------------
[Edit 2 times, last edit by Grumpy Swede at Oct 8, 2025 3:18:23 AM]
[Oct 8, 2025 2:50:16 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Kirel2
Advanced Cruncher
United States
Joined: Sep 24, 2014
Post Count: 118
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

October 7, 2025
We have resolved the issue with the BOINC scheduler configuration causing "Another scheduler instance is running for this host". Users should be able to report tasks. We will update as soon as we begin creating new workunits as we are still working to stand up the rest of the BOINC backend architecture.
Website went down briefly as we brought the scheduler online. We have adjusted the HAProxy configuration, and we will continue to adjust Apache/HAProxy config if we see the website stops responding again.
Still debugging issues with the new Kafka-based validation workflow that works together with HAProxy routing rules to partition BOINC downloads and uploads by assigning servers equal hex buckets using the https://github.com/BOINC/boinc/wiki/DirHierarchy BOINC expects, and emitting events from the new file_upload_handler we wrote to Kafka so we can batch and respond to them in parallel. This removes the need for multiple round trips to the database for row-wise operations and polling, which are now simply batch applications of state after consuming workunits ready for validation in the relevant Kafka topic for that application. This allows us to perform validation and assimilation in the same process, at least for the projects we run ourselves (MCM1, MAM1, ARP1), and while the Kafka/Redpanda learning curve was significant, we have successfully transitioned to an event-driven in-memory partitioned architecture that should let us keep pace with the upcoming GPU enabled MAM1 application.
----------------------------------------

[Oct 8, 2025 2:53:27 AM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1316
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Reporting of uploaded files seemed to start working between 00:15 and 00:30 UTC on 2025-10-08.

Download servers are probably still behind an ACL block (trying to get the three ARP1 files as a test still gets 403: Forbidden...) -- and most of those .png files are for dead projects anyway!

As at 02:45 UTC there's evidence that reported results have landed (my wingmen "in progress" count has dropped significantly whilst the tasks in PVal jail have gone up to match!), so validator and friends didn't seem to be active at this point in time.

I'll check again later, but I wouldn't be surprised if they have decided to leave the next stage until their next working day...

Cheers - Al.
[Oct 8, 2025 2:58:06 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Grumpy Swede
Master Cruncher
Svíþjóð
Joined: Apr 10, 2020
Post Count: 2493
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

I don't think we can expect any validations or new tasks, until perhaps tomorrow. There's still things to do.....

Edit, added: And as usual after an outage, the Device Profile changes, does not propagate to the BOINC client.
----------------------------------------
[Edit 1 times, last edit by Grumpy Swede at Oct 8, 2025 3:56:11 AM]
[Oct 8, 2025 3:53:30 AM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1316
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

[Edit -- sorry for the repost; I was typing in an earlier message at the same time Kirel2 posted it originally, and didn't scroll back until more recently...]

Another Operational Status report...
October 7, 2025
  • We have resolved the issue with the BOINC scheduler configuration causing "Another scheduler instance is running for this host". Users should be able to report tasks. We will update as soon as we begin creating new workunits as we are still working to stand up the rest of the BOINC backend architecture.
  • Website went down briefly as we brought the scheduler online. We have adjusted the HAProxy configuration, and we will continue to adjust Apache/HAProxy config if we see the website stops responding again.
  • Still debugging issues with the new Kafka-based validation workflow that works together with HAProxy routing rules to partition BOINC downloads and uploads by assigning servers equal hex buckets using the https://github.com/BOINC/boinc/wiki/DirHierarchy BOINC expects, and emitting events from the new file_upload_handler we wrote to Kafka so we can batch and respond to them in parallel. This removes the need for multiple round trips to the database for row-wise operations and polling, which are now simply batch applications of state after consuming workunits ready for validation in the relevant Kafka topic for that application. This allows us to perform validation and assimilation in the same process, at least for the projects we run ourselves (MCM1, MAM1, ARP1), and while the Kafka/Redpanda learning curve was significant, we have successfully transitioned to an event-driven in-memory partitioned architecture that should let us keep pace with the upcoming GPU enabled MAM1 application.

Cheers - Al.
----------------------------------------
[Edit 1 times, last edit by alanb1951 at Oct 8, 2025 4:54:37 PM]
[Oct 8, 2025 4:33:48 AM]   Link   Report threatening or abusive post: please login first  Go to top 
bfmorse
Senior Cruncher
US
Joined: Jul 26, 2009
Post Count: 442
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

The only thing I can state for sure is that all my 935 WU's ready to report have been transferred to WCG and I, as expected, have zero WU's here.

And as has been commented by others, they are still sorting through issues with validation and new WU's to download to us volunteers.

Progress is being made albeit slowly.
[Oct 8, 2025 4:56:10 AM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1316
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Interesting item in the third bullet point there! The latest core BOINC validator.cpp I could find still has the logic to request assimilation but block the transitioner for WUs with canonical results, so I wonder what they've done about that?

I can think of the following [basic] alternatives -- there may be others:

  • There is a newer BOINC-sanctioned validator.cpp that can cope with the "all in one" case
  • WCG have rewritten the relevant part of validator.cpp to deal with that
  • There are actually two co-operating processes for validation/assimilation that take advantage of a shared data pool whilst using the normal assimilator.cpp
  • There is still an assimilator process but it simply serves to let the transitioner see the WU again...

And another thought -- what's going to happen to the enormous backlog of validated MCM1 tasks that were sat waiting for assimilation -- they'll probably be in that "flagged for assimilation but hidden from transitioner" state, so I hope those have been taken into account :-) -- perhaps they will have to be marked for re-validation?

It will be interesting to see what happens when they do turn the validator/assimilator on, and perhaps at some future point they'll get a chance to describe what they did in layman's language! :-)

Cheers - Al.

[Edited to add reference to the assimilation backlog...]
----------------------------------------
[Edit 2 times, last edit by alanb1951 at Oct 8, 2025 6:22:50 AM]
[Oct 8, 2025 5:19:20 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 1293
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

It was nice to see an official update. Looks like some changes to make the system more stable and faster.

People are reporting that all outstanding WUs have been uploaded, now we wait for the next steps.
----------------------------------------
[Edit 1 times, last edit by Unixchick at Oct 8, 2025 2:51:49 PM]
[Oct 8, 2025 2:49:20 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Grumpy Swede
Master Cruncher
Svíþjóð
Joined: Apr 10, 2020
Post Count: 2493
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Many wingmen for the hundreds of tasks that i reported yesterday, still haven't reported their tasks. So even if they turn on the validation process, many of my tasks will remain in the "Pending Validation" mode. At least until the wingman task reach the deadline, and a new task is sent out to someone else.

But there is certainly progress, and the light at the end of the tunnel is the sun, and not a fast approaching locomotive smile
----------------------------------------
[Edit 2 times, last edit by Grumpy Swede at Oct 8, 2025 9:30:05 PM]
[Oct 8, 2025 9:28:40 PM]   Link   Report threatening or abusive post: please login first  Go to top 
bfmorse
Senior Cruncher
US
Joined: Jul 26, 2009
Post Count: 442
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

I recall reading of a few volunteers initializing, deleting and/or reinstalling BOINC thinking the problem that WCG was experiencing was on their end.

I wonder if many of your missing wingman's tasks were casualties of that mistaken WU purge?

I agree with your tunnel observation! cool
[Oct 8, 2025 9:39:27 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 567   Pages: 57   [ Previous Page | 5 6 7 8 9 10 11 12 13 14 | Next Page ]
[ Jump to Last Post ]
Post new Thread