Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
Member(s) browsing this thread: VulcanCat , StrayCat
Thread Status: Active
Total posts in this thread: 573
Posts: 573   Pages: 58   [ Previous Page | 18 19 20 21 22 23 24 25 26 27 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 43382 times and has 572 replies Next Thread
Garrulus glandarius
Advanced Cruncher
Romania
Joined: Apr 10, 2025
Post Count: 89
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Just had a few dozen tasks validated, all from the current tests. Still got 400+ waiting for validation since August, but at least something is moving.
----------------------------------------

[Oct 26, 2025 9:54:25 AM]   Link   Report threatening or abusive post: please login first  Go to top 
wildhagen
Veteran Cruncher
The Netherlands
Joined: Jun 5, 2009
Post Count: 1004
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Work seems to be flowing again. I get a lot of new workuniotes. Although these are still not "production" units, but testunits in the MCM1_9999995 range.
[Oct 26, 2025 10:20:29 AM]   Link   Report threatening or abusive post: please login first  Go to top 
MJH333
Senior Cruncher
England
Joined: Apr 3, 2021
Post Count: 300
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

I picked up 59 of these MCM1_9999995 test units between 09:43 and 10:20 UTC today (26 October).
As of now, 25 of these units have validated smile .
In most cases, I and my wingman claimed and received about 40 credits.
In a few cases, my wingman claimed 202.5 (a known issue!).
In one case, my wingman claimed only 10: https://www.worldcommunitygrid.org/contribution/workunit/765899390.
Cheers,
Mark
[Oct 26, 2025 12:51:50 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Grumpy Swede
Master Cruncher
Svíþjóð
Joined: Apr 10, 2020
Post Count: 2501
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Well, the new batch must be over by now. I have had my computer turned off, and when turning it on again, I did not get any new tasks.
Let's see if Dylan sends out any more batches. Validations of the previous test batches, seems to be ongoing though. So, that's progress....

Edit, added: I think the validation of previous test tasks stopped, well before all test task had been validated.
----------------------------------------
[Edit 2 times, last edit by Grumpy Swede at Oct 26, 2025 2:49:55 PM]
[Oct 26, 2025 2:34:00 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 1296
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

I'm seeing more test WUs validated. So I'm seeing progress.

looks like a new test batch went out 9999995 .
I haven't gotten any of theses

edit to add: all my test WUs that have a second result have validated. That is the few pending I have are due to a lack of a second result
----------------------------------------
[Edit 1 times, last edit by Unixchick at Oct 26, 2025 7:18:38 PM]
[Oct 26, 2025 7:15:30 PM]   Link   Report threatening or abusive post: please login first  Go to top 
dylanht
World Community Grid Tech
Joined: Jul 1, 2021
Post Count: 35
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Either a few more test batches, and tentatively, in the best case, the first contiguous normal batches back in the "MCM1_024%" range coming down the pipe later today.

Redeployed the reducers, validators, and assimilators after fixing copius configuration issues/misunderstandings, bugs, and oversights that were causing the low validation throughput. A small victory, the "replay" (https://docs.redpanda.com/current/develop/data-transforms/deploy/#reprocess which describes --from-offset to effectively re-do a bunch of workunit processing by pointing into the past in the upload event queue) seemed to go well as far as our bare bones Grafana+Prometheus monitoring dashboard for Kafka could show. The transforms were able to keep up with the burst of re-reading a few million upload events and building up their reduced map of workunit names to uploaded result files that upon reaching quorum, are emitted as a new event for the validators to consume, and the key in the map is deleted in the transform.

There was also an issue with the validators, where the payload buffer of the "successful validation" event emission was getting corrupted because I was wrongly emitting the event asynchronously and then trying to use the buffer again for the next pair of result files. There were other, even more embarassing bugs, but now the validator properly flushes the buffer and emits the message synchronously, while still churning through results quickly especially when it doesn't have to fallback to NFS to look for a result file. The assimilators seemed to build up their big credit dump and "state fast-tracking" transaction across the workunit, result, host, user, and team tables, and respect the check for canonical and validate_state for idempotence.

I am going to try and start the transitioner today, which should start picking up necessary resends for the test batches. If that works and I don't see any evidence that resuming scheduled MCM1 batches will make another mess too big to clean up while live, I will resume the normal MCM1 range and schedule the batches to go out at about the clip the assimilators seemed capable of handling per hour in the replay "stress test".
[Oct 26, 2025 7:43:49 PM]   Link   Report threatening or abusive post: please login first  Go to top 
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 2173
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Thanks for the update Dylan, specially on the weekend...

I hope with all these updates and changes, it is not forgotten to reflect all the work done in the Results page as well as producing external stats again.

Also, just to pick up an almost stone age topic, as you mentioned replaying stuff from the past, could that possibly, eventually, magically also include all those WUs that were returned 2 1/2 years ago when WCG was restarted after the transition from IBM.
There were a lot of promises (and outright lies) back then, so at least I would welcome an honest answer about this, even if that answer is that those roughly 4 months of work results are all lost by now...

thanks,

Ralf
[Oct 26, 2025 8:26:48 PM]   Link   Report threatening or abusive post: please login first  Go to top 
dylanht
World Community Grid Tech
Joined: Jul 1, 2021
Post Count: 35
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Hi Ralf,

Not forgotten. Changes to the result state are committed by the new assimilator to the BOINC database, and that should be reflected in the results page which fetches that data from the database via the APIs. Once I re-schedule the stats export scripts which are currently disabled, the external stats at https://download.worldcommunitygrid.org/boinc/stats/ and the stats on the website under the community tab should start updating again.

Regarding communications - often based on my reporting - that turned into outright falsehoods. There is no excuse for that. We always planned and tried to achieve what we said we would do, we just failed almost everytime for years. About that particular case, until we get to the end of the current issues I cannot give a firm answer. We were taking TSM backups of all NFS mounts from that period, and in theory that would include the BOINC database backups and file necessary to make good on that promise. Optimistically, there is a chance.

First, I must get all of MCM1, MAM1 (beta30, then launch), and ARP1 up and running again. Once I do, I can try the restore from TSM, and get a firm answer.

Best,
Dylan
[Oct 26, 2025 10:04:15 PM]   Link   Report threatening or abusive post: please login first  Go to top 
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 2173
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Hi Ralf,

Not forgotten. Changes to the result state are committed by the new assimilator to the BOINC database, and that should be reflected in the results page which fetches that data from the database via the APIs. Once I re-schedule the stats export scripts which are currently disabled, the external stats at https://download.worldcommunitygrid.org/boinc/stats/ and the stats on the website under the community tab should start updating again.

Regarding communications - often based on my reporting - that turned into outright falsehoods. There is no excuse for that. We always planned and tried to achieve what we said we would do, we just failed almost everytime for years. About that particular case, until we get to the end of the current issues I cannot give a firm answer. We were taking TSM backups of all NFS mounts from that period, and in theory that would include the BOINC database backups and file necessary to make good on that promise. Optimistically, there is a chance.

First, I must get all of MCM1, MAM1 (beta30, then launch), and ARP1 up and running again. Once I do, I can try the restore from TSM, and get a firm answer.

Best,
Dylan
Thanks Dylan,

my critique was less towards you, as a former "communications" person that was outright lying and then got all mad about it when pointed out. Let's hope that's in the past. It's just with all the communication issues over the last 3+ years now, it is at times really frustrating when there are these constant prolonged outages, which seems always kind of self-inflicted.

Let's see what happens next week,

thanks again,

Ralf
[Oct 27, 2025 1:54:06 AM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1317
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Dylan,

Thanks for the technical details!

I have a question relating to all the workunits that seemed to have been stuck awaiting assimilation since the over-full disk issue and all those stuck Pending Validation because of the migration downtime. Will clearing those out need use of the old validator and assimilator, or have you got some means of feeding them through your new set-up?

Cheers - Al.

P.S. An attempt to preview or post a slightly longer version of this from my usual "postings" system got "403 - Forbidden" messages...
----------------------------------------
[Edit 1 times, last edit by alanb1951 at Oct 27, 2025 2:24:08 AM]
[Oct 27, 2025 2:16:55 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 573   Pages: 58   [ Previous Page | 18 19 20 21 22 23 24 25 26 27 | Next Page ]
[ Jump to Last Post ]
Post new Thread